Claude Code 2.1.108: Smarter Prompt Caching and Session Recaps
This episode breaks down Claude Code 2.1.108’s new prompt-caching options, including the tradeoff between longer-lived session continuity and short-lived cache turnover for testing. It also covers the new /recap workflow and background session summaries that make it easier to jump back into long-running coding work.
This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.
Get StartedIs this your podcast and want to remove this banner? Click here.
Chapter 1
The expensive session problem
Lachlan Reed
[warmly] Welcome to the show. James, I want to start with two ugly little strings that can absolutely change your afternoon: `ENABLE_PROMPT_CACHING_1H` and `FORCE_PROMPT_CACHING_5M`. In Claude Code 2.1.108, those are now the explicit prompt-cache TTL choices -- one hour, or five minutes. And if you've ever looked at a bill mid-debug session and gone, "Hang on, why is this thing suddenly chewing through money like a trail bike through petrol?" ... this is probably the reason.
James Turner
[curious] Wait -- the `5M` is the one that can bite you, right? Because five minutes sounds fine until you're twenty minutes into a refactor and the model has to re-read the same giant context blob again.
Lachlan Reed
Exactly. Same large system prompt, same codebase context, same working goal -- but if that cache dies every five minutes, you're paying to rebuild that context again and again. Not every single turn, obviously, but often enough that in a long debugging loop it adds up. It's like paying a mate to re-open the toolbox every time you ask for the same spanner.
James Turner
[matter-of-fact] And the specific win with `ENABLE_PROMPT_CACHING_1H` is that it better matches how people actually work. A real coding session isn't always this clean, uninterrupted sprint. You get pulled into Slack, you grab lunch, you run tests, you stare at a stack trace for twelve minutes. One hour covers that.
Lachlan Reed
Yeah, that's the sweet spot. Same context, same task, whole afternoon humming along -- that's EXACTLY what cache hits are for. If Claude's been fed your repo shape and your constraints already, you don't want to keep re-paying for the same meal. You want it plated once and reused while the work is still basically the same.
James Turner
[skeptical] But we should be careful here: the one-hour cache isn't just "longer is better." The 1-hour entries cost more to write than the 5-minute ones. So you're making a bet. You're paying more upfront because you expect enough reuse to make it worth it.
Lachlan Reed
Right, that's the trade. More cache hits can mean lower per-turn cost over the session, but only if you actually stay in that lane. If you're constantly changing tasks, or nuking your context, or bouncing between unrelated prompts, then a longer TTL can be a bit like buying a big esky for one can of drink. Nice box, wrong day.
James Turner
The phrase I'd use is: you're choosing between cheap-to-write and likely-to-reuse. `FORCE_PROMPT_CACHING_5M` is cheaper to write, but easier to miss. `ENABLE_PROMPT_CACHING_1H` is more expensive to write, but way more forgiving if your session has human pauses in it.
Lachlan Reed
[reflective] And humans are mostly pauses, mate. That's the bit product people sometimes forget. We don't work like a benchmark script. We get interrupted. We forget what file we were in. We make tea. So this release is really saying: okay, let's let the cache behave more like an actual work session, not a stopwatch.
James Turner
And if you're hearing this and wondering, "Why did my session get expensive?" -- five minutes is the number to remember. `FORCE_PROMPT_CACHING_5M`. That's the token that explains a lot of mystery invoices.
Chapter 2
What the new variables are really for
James Turner
[excited] The other big thing in 2.1.108 is scope. `ENABLE_PROMPT_CACHING_1H=1` now works across API key usage, AWS Bedrock, Google Vertex, and Anthropic Foundry. That matters because the old story was way narrower -- the one-hour setting used to be Bedrock-specific in practice.
Lachlan Reed
So the name to lock in now is the provider-agnostic one: `ENABLE_PROMPT_CACHING_1H=1`. Not just for Bedrock, not some special corner case -- one switch across the main setups. That's tidy. I like tidy. My shed is not tidy, but my env vars... ideally, yes.
James Turner
[chuckles] Same. And then there's the opposite switch: `FORCE_PROMPT_CACHING_5M=1`. I think of that as the "fresh eyes every turn" mode. If you're iterating on prompts, testing behavior, or trying to isolate a prompt bug, you may NOT want older context hanging around very long.
Lachlan Reed
Yeah, that's a different job. In a real coding session, I want the model to remember the repo context for ages -- or, well, an hour. But if I'm diagnosing why a prompt template is behaving weirdly, short TTL can be a blessing. Otherwise you're sort of chasing a bug through old leftovers.
James Turner
Let me try to explain it back. [pauses] Long-lived cache for "help me keep working on this feature." Short-lived cache for "I need to know whether THIS prompt change caused THIS behavior." Is that basically it?
Lachlan Reed
Almost -- the part I'd sharpen is intention. `ENABLE_PROMPT_CACHING_1H=1` is for continuity. `FORCE_PROMPT_CACHING_5M=1` is for controlled turnover. One says, "please keep this mental workspace warm." The other says, "clear the bench quickly so I can observe changes."
James Turner
That's good. Controlled turnover. And for backward compatibility, the older `ENABLE_PROMPT_CACHING_1H_BEDROCK` still works. So if somebody has existing Bedrock scripts or team docs, this release doesn't blow them up.
Lachlan Reed
But it is nudging people to migrate. For consistency, really. If you've got one team on Bedrock, another on Vertex, another just using API keys, having everybody learn `ENABLE_PROMPT_CACHING_1H` is cleaner than dragging around the old `_BEDROCK` tail forever.
James Turner
[reflective] I actually like that this maps to two very human moods. Mood one: "I'm deep in the code, don't make me restate the world." Mood two: "Something is fishy, strip this down and let me test it clean." Same feature family, totally different purpose.
Lachlan Reed
And that's why these env vars aren't just knobs for infra nerds. They're workflow choices. If you're heads-down shipping, choose the one-hour path. If you're poking at prompt behavior with a stick, choose the five-minute one. Wrong tool, wrong job -- and suddenly the session feels haunted.
