Jellypod, Inc.

The Claude Code Changelog

TechnologyNews

Listen

All Episodes

Claude Code Ultrareview Goes CI-Native

We break down how ultrareview moved from an interactive slash command to a real CLI that can run in scripts, Make targets, and GitHub Actions as a potential build gate. The episode also covers JSON output for automation, the trust and latency tradeoffs of model-based PR checks, plus smaller quality-of-life updates like PowerShell fallback on Windows and better GitHub attribution.

This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.

Get Started

Is this your podcast and want to remove this banner? Click here.


Chapter 1

From slash command to build breaker

Lachlan Reed

[excited] Welcome to the show. James, I wanna start with one tiny syntax change that is absolutely NOT tiny: in Claude Code 2.1.120, `claude ultrareview [target]` became a real CLI command. Before that, `/ultrareview` lived inside the interactive session only. So the old flow was: a human opens Claude Code, runs the slash command, reads the findings, then decides what to do. Now? You can point it at something like `main..HEAD` from a script, a Make target, or a GitHub Actions job. That's a different beast, mate.

James Turner

[curious] Wait -- `main..HEAD` is the part that jumps out to me. That's not "help me review this file." That's "review the exact diff I'm about to merge." So we're talking about a tool moving from sidekick to pipeline checkpoint.

Lachlan Reed

[matter-of-fact] Exactly. And the practical shift is category-level. This isn't just a manual code-review accelerator anymore. It can act like an automated PR check: findings go to stdout, it returns `0` when it's clean or done, and `1` when it fails. Those two numbers -- `0` and `1` -- are the whole ballgame in CI. Once a tool speaks exit code, your pipeline listens.

James Turner

[skeptical] And that's where I get twitchy. `0` and `1` sound crisp. Models are NOT crisp. If an LLM occasionally misfires, and now that misfire becomes a failed check... you've basically let a probabilistic reviewer play traffic cop on your deploy lane.

Lachlan Reed

[reflective] Yeah, that's the tension. As a developer, I love a machine catching something before I ship a dodgy change at midnight -- I've done that, nearly nuked a client's site once, not my finest hour. But a build breaker is different from a helper. A helper can be wrong and annoying. A gate can be wrong and expensive.

James Turner

[pauses] The word you used there -- "gate" -- is the memorable one for me. Because teams have tolerated AI as advisor. Let it comment, let it suggest, let it draft. But the second it can fail the PR with `1`, you're not asking "is this useful?" anymore. You're asking "what level of trust does this deserve?"

Lachlan Reed

[questioning tone] And maybe the answer is: not full trust. Not at first. I can imagine teams using ultrareview as a soft gate -- fail only on certain classes of findings, or run it as informational for a while. Because chucking a model straight into the release pipeline, no guardrails... that's like handing the keys to the ute to someone who's still learning the clutch.

James Turner

[laughs] Very Australian image, but yeah. I also think there's a cultural piece. If your engineers already roll their eyes at flaky tests, they are going to HAATE a flaky AI review check. One false block on a Friday afternoon and suddenly everybody's got religion about deterministic systems.

Lachlan Reed

And fair enough. But still -- the fact this can now run non-interactively matters. That's the line crossed. It's no longer "open the tool and ask nicely." It's "wire the tool into the system."

Chapter 2

JSON, CI reality, and the friction that disappeared

James Turner

[excited] For me, the real unlock isn't even the plain CLI command. It's `--json`. Because human-readable output is nice, but machine-readable output is POWER. `claude ultrareview main..HEAD --json > review.json` means you can parse severities, suppress noise, turn findings into GitHub PR annotations, or send a Slack alert only when something critical shows up.

Lachlan Reed

[reflective] `review.json` is the bit I'd stick on the whiteboard. Once it's in JSON, you're not just reading the review -- you're routing it. You can have one path for critical findings, another for medium stuff, and maybe ignore the little paper-cut warnings that'd otherwise make people mutter into their coffee.

James Turner

[matter-of-fact] Right, but there is a very unsexy catch: authentication. That GitHub Actions runner needs credentials. `ANTHROPIC_API_KEY` or managed credentials. Unauthenticated, it just won't work. And that's obvious once you say it out loud, but CI failures caused by missing secrets are the kind that eat 45 minutes and make you feel silly.

Lachlan Reed

[chuckles] Oh, 100%. That's a classic "why's the bike not starting" moment, and then you realize there's no fuel in it. But the bigger engineering caution is the JSON schema. It's not versioned yet. So if you build some deep, brittle parser that assumes field X lives under object Y forever... mate, you're building on sand.

James Turner

The phrase "not versioned yet" is the one that would stop me from getting too fancy. I'd absolutely consume the JSON, but shallowly. Parse the basics. Severity, message, maybe location if it's there. I would NOT build a giant internal platform around a schema that can still move under me.

Lachlan Reed

Let me try to explain that back. You're saying: use `--json`, but don't marry it. Date it. Keep a toothbrush at each other's place maybe, but don't combine bank accounts.

James Turner

[laughs] That is... weirdly accurate. Yeah. And there are real cost questions too. On large diffs, ultrareview can take 2 to 5 minutes. That's not instant. In CI terms, 5 minutes is long enough for people to start another coffee, context-switch, and then resent your tooling.

Lachlan Reed

And it's not just waiting around for the sake of it. The reason it can take that time is that ultrareview runs parallel agents across the diff. So you're spending non-trivial compute to get richer review coverage. Which, honestly, I find appealing -- finally, a machine-readable gate with a bit of depth to it -- but it's not free.

James Turner

The "parallel agents" part is the tradeoff in one phrase. More eyes on the diff, more compute, more latency. So the question becomes: is catching one high-severity bug before merge worth adding, say, 3 minutes to every PR? At a startup shipping 20 PRs a day, that adds up FAST.

Lachlan Reed

[skeptical] See, I'd push back a little there. We already tolerate slow integration tests because they save us from pain later. If ultrareview catches one serious issue that would've slipped into production, 3 minutes is cheap as chips. The trick is making sure it's catching that kind of issue often enough to earn its seat.

James Turner

[softly] That's the key -- "earn its seat." Not because AI is magic, and not because AI is fake. Because every CI job has a tax. Time tax, compute tax, trust tax. `--json` makes the automation possible. It does not make the tradeoff disappear.

Chapter 3

Quiet wins for Windows, GitHub attribution, and MCP

Lachlan Reed

The other thing I liked in 2.1.120 and 2.1.121 is they quietly shaved off friction elsewhere. Windows teams got a proper quality-of-life win: Git for Windows or Git Bash is no longer required, because Claude Code can fall back to PowerShell as the shell tool. That's not headline stuff, but for mixed-platform teams it's huge. One less setup pothole.

James Turner

[matter-of-fact] "No longer required" is the memorable phrase there. Because if you've ever onboarded someone on Windows and had to say, "Okay, now install Git Bash too," that's one more yak to shave before they write any code. PowerShell fallback means the tool meets the environment where it actually lives.

Lachlan Reed

Yep. And another one: the `AI_AGENT` environment variable now gets set for subprocesses. That sounds niche, but it means the `gh` CLI can correctly attribute traffic to Claude Code instead of treating it like anonymous automation. Little detail, big operational difference.

James Turner

[curious] The token I grab there is `gh`. Because once GitHub CLI traffic is attributed properly, you're not just automating -- you're labeling the automation. That's governance stuff. Auditability. Knowing whether a human ran the action or the agent did.

Lachlan Reed

[warmly] Right, and that folds into MCP getting more production-friendly. Two changes. One: `alwaysLoad: true` keeps a server's tools immediately available instead of hiding them behind search. Two: transient startup errors now auto-retry up to 3 times instead of leaving the server disconnected. I really like both, because they reduce that annoying "is the tool gone or just asleep?" feeling.

James Turner

"Up to 3 times" is exactly the sort of number ops people care about. Not infinite retries, not one brittle failure -- three. And `alwaysLoad: true` tells me they're optimizing for reliability and predictability over clever minimalism. If a tool matters, make it present. Don't make me hunt for it.

Lachlan Reed

[reflective] Put all that together and the picture gets interesting. Review can be automated. Tool availability gets stickier. Attribution gets cleaner. Windows setup gets less annoying. None of those on their own is flashy. Together, they make the whole system feel more... ready for real teams, not just tinkerers in a terminal.

James Turner

And that lands us on the uncomfortable question, doesn't it? Once review, tooling, and attribution are all automatable, where do you draw the line? At "assistant suggests"? At "assistant annotates"? Or at "assistant blocks merge"?

Lachlan Reed

[softly] Yeah. Because the tooling line has already moved. The human line -- who's allowed to decide, and when -- that's the one teams still have to draw for themselves.

James Turner

[calm] That's the show. See ya.