Claude’s Weird Week and Opus 4.7’s API Trapdoors
This episode digs into Anthropic’s postmortem on Claude’s silent output corruption, from misrouted traffic to TPU/compiler bugs that caused garbled or truncated responses for a significant share of users. It also covers Opus 4.7’s breaking API changes, including hard failures on familiar generation settings and a new tokenizer that can quietly raise token costs.
This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.
Get StartedIs this your podcast and want to remove this banner? Click here.
Chapter 1
The week Claude got weird
Lachlan Reed
[curious] Welcome to the show. James, imagine this: you ask Claude for a code fix, and instead of a neat little patch, it starts sprinkling random Thai or Chinese characters into an English reply, drops a syntax error into the generated code, and then just... cuts the answer off halfway with NO error message. That was not one-off weirdness. In Anthropic's September 2025 postmortem, they said three separate infrastructure bugs, one after another, degraded Claude responses from August through mid-September, and roughly 30% of Claude Code users saw at least one bad message in that window.
James Turner
[skeptical] Thirty percent is the number that sticks. Not "a few unlucky edge cases" -- THIRTY. That's basically one out of three Claude Code users getting at least one poisoned response between August and mid-September. And the no-error-surfaced part is the bit that freaks me out, because if the model crashes loudly, fine, you know you've got a problem. If it answers confidently with corrupted output, now the developer starts doubting their own benchmark, their own tests, their own eyes.
Lachlan Reed
[reflective] Exactly. It's like your trail bike makes a funny noise, but only every sixth ride, and only when you're already late. You don't know if the bike's broken or if you've gone a bit spare. Anthropic laid out a timeline, and the first bug starts on August 5: some requests were routed to servers configured for an upcoming 1M-token context window. Not the general setup people expected -- this upcoming configuration. So traffic gets silently misrouted, and right there you've got weird behavior before anyone's even talking about model quality in the abstract.
James Turner
[questioning tone] Wait -- August 5 is early. So this wasn't "one bad deploy on one day." This was the start of a chain. And that 1M-token context window server config, even hearing that phrase, you can feel how easy it would be to dismiss. Like, "oh, infra stuff." But misrouting to the wrong server class means you're not even testing the model you THINK you're testing, right?
Lachlan Reed
[matter-of-fact] That's the tension. People say "the model got weird," but one of these failures wasn't the model getting moody -- it was traffic going somewhere it shouldn't. Then August 25 brought a second issue: output corruption from a runtime optimization misconfiguration. That's a nasty one, because now it's not just where the request lands; it's the actual generation path getting mangled. That's where those garbled characters, truncated responses, and incoherent outputs fit the picture much more directly.
James Turner
[responds quickly] August 25 twice, though -- because there was ALSO an XLA/TPU compiler bug that surfaced the same day, tied to mixed-precision arithmetic. And that's a different beast. Misconfiguration is one category of problem. A compiler bug in XLA on TPUs with mixed precision... that's the sort of sentence that makes every ML engineer sit up straight, because now the system can choose subtly wrong tokens in a way that's hard to reproduce.
Lachlan Reed
[pauses] Yeah, and "subtly wrong tokens" sounds small until you think about what code generation IS. One token off can turn valid code into broken code. One token off in prose can be invisible. One token off in a command or import path can waste an afternoon. So when Anthropic says there were three separate infrastructure bugs, that matters because these weren't three copies of the same failure. One silently misrouted traffic. Another corrupted output probabilities through runtime optimization gone wrong. And the XLA/TPU mixed-precision issue nudged token selection off in ways that were slippery and hard to pin down.
James Turner
[serious] The phrase I'd use is epistemic damage. Not just software damage -- trust damage. If generated code acquires syntax errors, okay, a linter catches some of that. If English replies suddenly spit out Thai or Chinese characters, you know something is off. But if the model is merely choosing the wrong token occasionally, and it still looks plausible, now your A/B test might pass one hour and fail the next. Your eval suite might say the prompt is the problem when the real issue is lower in the stack.
Lachlan Reed
[chuckles] That's the bit I keep coming back to. Developers love to assume the bug is in their prompt, or their parser, or some midnight change they pushed with a cup of terrible shed coffee in hand. Been there. But this postmortem is a good reminder that infrastructure bugs can masquerade as application bugs. If a response truncates with no surfaced error, you might waste hours rewriting perfectly decent code. If corruption sneaks into probabilities, you can end up "fixing" the wrong thing.
James Turner
And Anthropic publishing the timeline matters for exactly that reason. August 5 for the routing issue. August 25 for the runtime optimization misconfiguration. August 25 again for the XLA/TPU compiler bug with mixed-precision arithmetic. Those dates give developers something concrete to line up against their own logs. Because otherwise "Claude felt off for a while" really does sound like vibes. This was NOT vibes. It was three specific failures, over a specific window, affecting a specifically large slice of users.
Chapter 2
Opus 4.7 broke old assumptions
James Turner
[excited] And then you get the other kind of AI pain -- not silent degradation, but hard breakage. Opus 4.7 shipped on April 16, 2026, and if your old config passes temperature, top_p, or top_k with any non-default value in the Messages API, you now get a 400. So something as innocent-looking as temperature: 0.7 -- a setting teams have copied around for years -- can stop production COLD.
Lachlan Reed
[deadpan] Temperature 0.7. That's the sort of line sitting in config files like an old garden gnome -- nobody remembers who put it there, but everybody's afraid to move it. And now Opus 4.7 says, nah mate, 400. Not "we'll ignore it." Not "deprecated warning." Just a hard API failure.
James Turner
Exactly. And there's a second breaking change right next to it: thinking.budget_tokens is gone too. Extended thinking budget control is removed from the API surface for this model, and passing that setting also returns a 400. So if you upgraded by swapping the model name and left the rest of your request body untouched, you could break on TWO fronts before the model even starts generating.
Lachlan Reed
[skeptical] Let me try to explain it back. Old habit says, "new model, same knobs." You keep temperature, maybe top_p, maybe top_k, maybe thinking.budget_tokens if you've been tuning that. In Opus 4.7, those familiar knobs aren't just ignored -- some of them are basically wired to the trapdoor. Is that fair?
James Turner
That's fair. And the trapdoor number is 400, which developers hate because it's immediate and very unromantic. Your app just fails. Then there's the sneakier problem: cost. Opus 4.7 uses a new tokenizer, and Anthropic says it can use anywhere from 1x to 1.35x more tokens than earlier models depending on content type. So your prompt and output can look basically unchanged to a human, but the bill goes up because the tokenizer slices the text differently.
Lachlan Reed
[questioning tone] That 1.35x is the sticky one for me. Thirty-five percent more tokens without changing what the user THINKS they said or got back -- that's like your servo at the petrol station charging extra because it decided your jerry can has more edges today. Same fuel, bigger bill. And because it's "depending on content type," you can't just assume one neat multiplier across everything.
James Turner
Right, and for agentic apps there's another wrinkle: reports of regressions in long-context retrieval and agentic search at default effort levels. The workaround people have pointed to is using high or xhigh effort in Claude Code for agentic workloads. Which means the default may not be good enough for the exact class of tasks a lot of teams care about -- multi-step searching, tool use, long-context reasoning.
Lachlan Reed
[frustrated] And that's where the "just bump the version" habit absolutely bites you. Because now the upgrade isn't one thing. It's API compatibility, removed controls, tokenization economics, and behavior changes in agentic search. If your team treats model upgrades like changing a logo in the header -- easy peasy -- you're in strife. This is dependency-upgrade territory. Read the changelog, diff the payloads, run evals, watch costs, the whole box and dice.
James Turner
[reflective] I actually think that's the deeper lesson connecting both stories. In the 2025 Claude degradation case, the danger was invisible system behavior making developers mistrust themselves. In Opus 4.7, the danger is visible if you look -- 400s, removed fields, token shifts -- but teams often don't look closely enough because they treat model swaps as a quick version bump. Same emotional outcome: the engineer on call at 1 a.m. is staring at a dashboard thinking, "Did I break this, or did the platform move under me?"
Lachlan Reed
[warmly] Yeah. And maybe that's the new default mindset: don't treat a model name like a cosmetic string. Treat it like upgrading a database engine, or a compiler, or a payment SDK. Because if a harmless old temperature: 0.7 can 400 your production app on April 16, 2026, the real bug isn't just in the release. It's in the habit of assuming intelligence APIs behave like plug-and-play widgets. They don't. [short pause] Anyway -- go audit your config before it audits you. See ya.
James Turner
[chuckles] That's a solid place to leave it. See you next time.
