Claude Code v2.1.90: 27 days, 22 versions, one expensive bug
Since v2.1.69, --resume triggered a full prompt-cache miss for anyone running MCP servers or deferred tools. 27 days, 22 versions, ~$285K in estimated wasted API spend. Here's what broke, why it lasted, and what it says about trust.
I run Claude Code with 8 MCP servers and about 110 deferred tools per session. That puts me squarely in the population this regression hit hardest, with 24.2 billion tokens of usage data to spot it.
The bug
Since v2.1.69, --resume triggered a full prompt-cache miss on the first request for sessions using deferred tools, MCP servers, or custom agents. The git record is clear: commit 9582ad4 shipped March 5. v2.1.90 shipped April 1. 27 days. 22 versions.
It wasn't even --resume's only bad month. Around v2.1.85, a separate regression caused older sessions to crash with:
tool_use ids found without tool_result blocks
That was fixed in v2.1.86. By mid-March, --resume was broken in two different ways simultaneously: one visible crash, one silent token drain. The crash got attention. The cache miss didn't.
What it cost
Claude Code keeps a local token ledger at ~/.claude/stats-cache.json. Mine:
| Metric | Value |
|---|---|
| Total sessions | 414 |
| Sessions per active day | 10.9 |
| Cache read tokens | 23,467,319,874 |
| Cache write tokens | 670,339,241 |
| Cache hit ratio | 97.2% |
| Total token throughput | ~24.2 billion |
A 97.2% cache hit ratio is what healthy looks like — system prompt, tool schemas, conversation history served from cache on nearly every turn.
With 110 deferred tools across 8 MCP servers, each resume during the regression re-billed roughly 49,000 tokens at cache write rates instead of read rates:
| Cache hit | Cache miss | |
|---|---|---|
| 49K tokens at read rate ($0.30/MTok) | $0.0147 | — |
| 49K tokens at write rate ($3.75/MTok) | — | $0.1837 |
| Extra cost per resume | +$0.169 |
11.5× more expensive. Same first request.
With 10.9 sessions per active day over 27 days, I had around 258 affected resumes:
| Regression window | March 5 to April 1 |
| Affected sessions | ~258 |
| Extra tokens burned | ~12,628,881 |
| Equivalent API cost | ~$43.56 |
| Rate limit burn | ~11× faster |
For Max plan users, there is no direct dollar cost. The cost is rate limits. A cache miss burns hourly budget dramatically faster than a hit. That likely explains many of the "why am I hitting my usage limit so fast?" posts through March.
At scale
My assumptions, upfront: 50,000–200,000 active Claude Code users, 15–25% with MCP servers, about 2.5 affected sessions per user per day, 27 days confirmed via git.
| Scenario | Users | Sessions | Extra tokens | API cost |
|---|---|---|---|---|
| Conservative | 7,500 | 506,250 | 24.8B | $85,600 |
| Midpoint | 25,000 | 1,687,500 | 82.7B | $285,200 |
| High | 50,000 | 3,375,000 | 165.4B | $570,400 |
Somewhere between $85K and $570K in wasted API spend. The midpoint estimate is around $285K.
One caveat: v2.1.89 also fixed a /stats bug that undercounted tokens from subagent runs. My 24.2B is probably low. The real cost was likely higher.
What broke technically
Prompt cache keys are computed over the exact byte sequence of the tool schema array. If the resumed session sends a different array from the original, the key changes, the first request misses, and the full payload gets billed at cache write rates again.
Three mechanisms appear to have contributed to the mismatch.
When tool search is active, Claude Code filters deferred MCP tools down to those already discovered through tool_reference blocks in message history. In a live session that set grows gradually. On --resume, all prior references load at once — so turn 1 of the resumed session can send a larger array than turn 1 of the original did. Different bytes, different key, full miss.
MCP servers also reconnect in parallel during resume. If one hasn't finished before the first API call, its tools are absent from the array. Same result: different bytes, different cache key.
And if agent schemas get assembled before all custom agents finish re-registering, the reconstructed array differs again.
After the source code leaked, a community reverse engineer identified a more specific mechanism. There's a function called db8 in cli.js that filters what gets saved to session JSONL files under ~/.claude/projects/. For non-Anthropic users, it strips attachment-type messages. Sounds harmless until you notice that deferred_tools_delta and mcp_instructions_delta are attachment-type — they're the records tracking which tools have already been announced to the model. Strip them out, resume the session, and Claude Code reconstructs tool state from scratch. The list comes back different. Cache miss, every time.
Why it lasted 27 days
The v2.1.85 crash was fixed fast because it was visible. Error message, report, patch. The cache miss was the opposite — sessions still worked, nothing crashed, costs crept up. Rate limits just started hitting sooner.
What makes this worse: v2.1.86 shipped a deliberate cache hit improvement for Bedrock, Vertex, and Foundry users by removing dynamic content from tool descriptions. Anthropic was actively fixing cache efficiency on cloud provider surfaces while a CLI regression from nine versions earlier kept shipping.
Nobody outside Anthropic could diagnose the CLI issue in detail until the source code leaked. Once it did, the community found db8 and had a patch with a test suite out within hours.
The April 1 timeline is genuinely messy:
| Date | Event |
|---|---|
| March 5 | v2.1.69 ships, silently breaks --resume prompt caching |
| Mid-March | v2.1.85 introduces a second --resume regression: session crashes |
| Mid-March | v2.1.86 fixes the crash, ships cache improvements for cloud providers; CLI regression continues |
| March 31 | Source code leaks; community identifies db8 within hours |
| April 1, US morning | Anthropic ships v2.1.89 with multiple cache and resume fixes |
| April 1, overlap | Community patch cc-cache-fix ships with test_cache.py |
| April 1, US evening / EU next morning | Anthropic ships v2.1.90, names the --resume cache-miss fix explicitly |
Anthropic was already working through a backlog of cache bugs before v2.1.90 landed. Whether the leak accelerated the final fix or merely overlapped with it is impossible to say from the outside.
What this says about trust
The community patch made three changes to cli.js:
- Preserve
deferred_tools_deltaandmcp_instructions_deltain session JSONL so resume can reconstruct the cache prefix correctly - Ignore injected meta messages in the first-message hash so the cache key stays stable across turns
- Force a one-hour cache TTL instead of falling back to five minutes via feature flag
That third one doesn't appear anywhere in the official changelog. Whether TTL differences are a product decision or a separate outstanding bug — unclear. Either way, the community patch provides something the official release does not: a mechanism users can inspect and test.
This is the bigger issue.
Anthropic's changelog says the bug was fixed. The community analysis names the function, names the stripped message types, explains why stripping them causes the miss, and ships a test that measures cache read ratios so you can verify it yourself. The official patch is a black box. You're asked to trust it, not given a way to check it.
That's a trust problem.
"We fixed it" isn't enough when the failure mode is invisible and the cost is real. What should exist is a reproducible verification path: a published test suite, open-sourced caching logic, or both.
This probably should have been caught outside the product entirely. If the first request of a resumed session suddenly costs 12× more than yesterday, that should trip a cost gate immediately. That is not a Claude Code feature problem. It is an observability gap.
After upgrading
Run --resume on a session with MCP servers and watch the first request. It should be noticeably faster.
Compare daily token costs before and after if you track spend — this regression was consistent enough that the improvement should show up clearly.
Still seeing slowdown in long sessions past 200 turns? Worth filing. Both v2.1.89 and v2.1.90 included several fixes in this area, so persistent degradation means there's another code path left.
Petr Kindlmann — breakit.dev · kindlm.com