Claude Code’s 1-Hour Prompt Cache Upgrade
Claude Code v2.1.108 extends prompt caching from 5 minutes to 1 hour, making long system prompts and big repo contexts stay warm through real work sessions instead of expiring during normal breaks. The episode also covers the `DISABLE_TELEMETRY` bug fix, the optional 5-minute fallback, deprecated Bedrock compatibility, and why any prefix change still invalidates the cache immediately.
This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.
Get StartedIs this your podcast and want to remove this banner? Click here.
Chapter 1
Five minutes to sixty
Lachlan Reed
[excited] Welcome to the show. James, imagine this: you’ve loaded up a monster prompt -- big system instructions, a chunky `CLAUDE.md`, maybe a 100K-token context -- then you stand up for, I dunno, a quick bathroom break, come back six minutes later, and the whole expensive prefix has gone cold. Gone. You’re paying to shove the same truckload back through the gate again.
James Turner
[questioning tone] Six minutes later is the killer part. Not “tomorrow,” not “after lunch” -- six minutes. So the old 5-minute TTL basically meant a normal human pause could blow up your cache?
Lachlan Reed
[matter-of-fact] Exactly. And Claude Code v2.1.108 changes that in a really meaningful way. There’s a new flag, `ENABLE_PROMPT_CACHING_1H`, and it bumps the default prompt-cache TTL from 5 minutes to 1 hour for API key, Bedrock, Vertex, and Foundry auth modes. So in plain English: the pricey front half of your session gets loaded once, then stays warm for an actual work block instead of expiring while you’re refilling your coffee.
James Turner
[leaning in] `1H` is the memorable token there. From 5 minutes to 60 minutes is not a tweak. That’s twelve times longer. If your prefix is 100K tokens, that’s the difference between “interactive coding assistant” and “meter’s running while I wash my hands.”
Lachlan Reed
[laughs] Yeah, it’s the difference between a ute with a full tank and one that stalls at every traffic light. And this matters most when your prefix is fat: a long system prompt, a carefully built `CLAUDE.md`, loads of repo context. Those are exactly the bits you do NOT wanna keep resending because they’re the expensive part.
James Turner
Let me try to explain it back. [pauses] The cache is basically holding onto the repeated prefix -- the same opening bytes, same instructions, same big context -- so later turns don’t have to reprocess that whole chunk. And the upgrade here is not smarter caching, it’s longer memory. Same idea, way less fragile.
Lachlan Reed
That’s it. Not exactly “memory” in the chat-history sense, more like the prefix stays preloaded. Which means if you’re working in bursts -- ask a question, read code, think for ten minutes, come back -- you’re still sitting on the warm path. Before, that ten-minute think was enough to kick the ladder out from under you.
James Turner
[skeptical] Okay, but I want the practical picture. Say I’ve got a giant `CLAUDE.md` at the root of a repo and I’m pair-programming for, like, 45 minutes. Under the new setup, as long as I don’t change the prefix, that expensive setup cost mostly happens once?
Lachlan Reed
Mostly, yeah. That’s the clean mental model. Load the heavy stuff once, keep it warm for an hour. So a normal work session -- not an all-day marathon, just a decent block -- stops getting reset by little gaps. For people actually using Claude Code like a tool, not a benchmark harness, that’s a pretty chunky quality-of-life gain.
James Turner
And there’s a surprise buried in the same release, right? Because the TTL change is already useful, but there was also a bug fix tied to `DISABLE_TELEMETRY`.
Lachlan Reed
[matter-of-fact] Yep, and this bit’s sneaky. The same release fixed a bug where users with `DISABLE_TELEMETRY` were silently stuck on the old 5-minute TTL, even though they should have gotten the 1-hour cache. Silently is the key word. No big error, no flashing light -- just worse caching than expected.
James Turner
`DISABLE_TELEMETRY` being the token that makes this sting. Because the kind of user who sets `DISABLE_TELEMETRY` is usually the kind of user who also notices weird cost behavior and starts debugging their own workflow. They may have thought, “Huh, maybe prompt caching just isn’t helping much,” when actually they were pinned to 5 minutes.
Lachlan Reed
[reflective] Right. And that’s why this release feels bigger than one environment variable. It’s not just “here’s a longer cache.” It’s also “by the way, some of you never got the longer window you were meant to get.” That’s a proper gotcha. Bit of a dog’s breakfast, really.
James Turner
There’s also a subtle product point here. Five minutes sounds fine on paper if you think in pure request-response cycles. But humans don’t work in uninterrupted loops. We tab away, read docs, answer Slack, stare at a test failure, walk to the kitchen. One hour maps to reality way better.
Lachlan Reed
[warmly] Exactly. Five minutes is machine time. One hour is people time. And when the whole point of the tool is interactive work, that shift from machine time to people time is the actual story.
Chapter 2
The knobs and the caveats
James Turner
So let’s get concrete on setup. The new knob is simple: `export ENABLE_PROMPT_CACHING_1H=1` before launching `claude`. That enables the 1-hour TTL. And if you need the old behavior, there’s a reverse switch: `FORCE_PROMPT_CACHING_5M=1`.
Lachlan Reed
[curious] I wanna poke that `FORCE_PROMPT_CACHING_5M=1` bit, because on first glance it sounds backwards. Why would anyone choose the shorter 5-minute TTL after getting the roomy 1-hour one?
James Turner
Predictability. [short pause] Especially for lots of short `claude --print` jobs, maybe running in parallel. If prefixes linger for a full hour, you can get reuse patterns that are great for humans but not always what you want in batch workflows. The shorter TTL can make per-run billing behavior feel more bounded and repeatable.
Lachlan Reed
So the tension is: humans love warm caches, automation sometimes loves cleaner edges. That’s fair. It’s a bit like leaving the workshop lights on because you’re ducking in and out all afternoon versus shutting everything down between jobs because you want the ledger neat.
James Turner
Exactly. And there’s one more compatibility wrinkle: the old Bedrock-only flag, `ENABLE_PROMPT_CACHING_1H_BEDROCK`, is deprecated but still honored. Which sounds boring until you remember how many teams have ancient shell scripts and CI wrappers nobody’s touched in months.
Lachlan Reed
[chuckles] Months? Mate, I’ve seen env vars survive longer than houseplants. But yeah, `ENABLE_PROMPT_CACHING_1H_BEDROCK` still being honored matters because it means old deployment scripts don’t instantly faceplant. Deprecated, not dead.
James Turner
That distinction -- deprecated but still honored -- is the listener takeaway there. Your old Bedrock setup may keep working, but it’s a sign to clean house before that compatibility shim disappears in some future release.
Lachlan Reed
Now, big caveat: this does NOT affect `claude.ai/code` OAuth sessions. Those handle caching server-side. So if someone hears all this and goes, “Sweet, I’ll export the flag and my OAuth session changes,” nah -- wrong paddock. Different path entirely.
James Turner
`claude.ai/code` OAuth being excluded is important. Because otherwise people are gonna set `ENABLE_PROMPT_CACHING_1H=1`, see no difference in that flow, and assume the feature is broken. It’s not broken; it just doesn’t apply there.
Lachlan Reed
And here’s the edge case that catches people even when the TTL is perfect: the cache still breaks instantly if the prefix bytes change. Instantly. Edit the top of `CLAUDE.md`, tweak the system prompt, alter those opening bytes, and you’ve invalidated the warm cache no matter whether you picked 5 minutes or 1 hour.
James Turner
[sharp] “Prefix bytes change” is the phrase to remember. Not “the meaning changed,” not “the instructions are kinda similar.” If the bytes at the front are different, the warm cache is gone. So if you keep fiddling with the top of `CLAUDE.md`, you’re basically kicking your own chair out from under you.
Lachlan Reed
Yeah, that’s the thing. Folks hear “one-hour cache” and imagine some magical persistence. It’s not magic. It’s conditional. Stable prefix, warm cache. Changed prefix, cold start. Dead simple once you say it out loud, but easy to miss when you’re in the weeds.
James Turner
I actually think that’s where the real decision lives. If you’re an interactive user -- long repo context, lots of back-and-forth, normal human pauses -- the 1-hour TTL is a clear win. But if you’re running many short `claude --print` jobs in parallel, a longer-lived prefix can hang around longer than you want, and “better” stops being universal.
Lachlan Reed
[reflective] Which I like, honestly. It means there isn’t one blessed setting for everyone. One hour is better for people-time. Five minutes can still be better for batch-time. Same tool, different jobs. And if there’s a lesson in this release, it’s probably that the clever bit isn’t the cache itself -- it’s knowing whether you’re optimizing for a human at a keyboard or a pile of automated runs hammering away in parallel.
James Turner
[calm] That’s the question I’d leave people with before they export anything: are you trying to keep a conversation warm, or keep your runs predictable? Because those are not always the same goal.
Lachlan Reed
[softly] And if your answer is “both,” well... welcome to engineering. Catch you next time.
