Back to Blog
Developer ToolsFeatured7 min readApril 2, 2026

Claude Code v2.1.90: 27 days, 22 versions, one expensive bug

Since v2.1.69, --resume triggered a full prompt-cache miss for anyone running MCP servers or deferred tools. 27 days, 22 versions, ~$285K in estimated wasted API spend. Here's what broke, why it lasted, and what it says about trust.

Claude CodePrompt CachingMCPAI ToolingDebugging

I run Claude Code with 8 MCP servers and about 110 deferred tools per session. That puts me squarely in the population this regression hit hardest, with 24.2 billion tokens of usage data to spot it.


The bug

Since v2.1.69, --resume triggered a full prompt-cache miss on the first request for sessions using deferred tools, MCP servers, or custom agents. The git record is clear: commit 9582ad4 shipped March 5. v2.1.90 shipped April 1. 27 days. 22 versions.

It wasn't even --resume's only bad month. Around v2.1.85, a separate regression caused older sessions to crash with:

tool_use ids found without tool_result blocks

That was fixed in v2.1.86. By mid-March, --resume was broken in two different ways simultaneously: one visible crash, one silent token drain. The crash got attention. The cache miss didn't.


What it cost

Claude Code keeps a local token ledger at ~/.claude/stats-cache.json. Mine:

MetricValue
Total sessions414
Sessions per active day10.9
Cache read tokens23,467,319,874
Cache write tokens670,339,241
Cache hit ratio97.2%
Total token throughput~24.2 billion

A 97.2% cache hit ratio is what healthy looks like — system prompt, tool schemas, conversation history served from cache on nearly every turn.

With 110 deferred tools across 8 MCP servers, each resume during the regression re-billed roughly 49,000 tokens at cache write rates instead of read rates:

Cache hitCache miss
49K tokens at read rate ($0.30/MTok)$0.0147
49K tokens at write rate ($3.75/MTok)$0.1837
Extra cost per resume+$0.169

11.5× more expensive. Same first request.

With 10.9 sessions per active day over 27 days, I had around 258 affected resumes:

Regression windowMarch 5 to April 1
Affected sessions~258
Extra tokens burned~12,628,881
Equivalent API cost~$43.56
Rate limit burn~11× faster

For Max plan users, there is no direct dollar cost. The cost is rate limits. A cache miss burns hourly budget dramatically faster than a hit. That likely explains many of the "why am I hitting my usage limit so fast?" posts through March.

At scale

My assumptions, upfront: 50,000–200,000 active Claude Code users, 15–25% with MCP servers, about 2.5 affected sessions per user per day, 27 days confirmed via git.

ScenarioUsersSessionsExtra tokensAPI cost
Conservative7,500506,25024.8B$85,600
Midpoint25,0001,687,50082.7B$285,200
High50,0003,375,000165.4B$570,400

Somewhere between $85K and $570K in wasted API spend. The midpoint estimate is around $285K.

One caveat: v2.1.89 also fixed a /stats bug that undercounted tokens from subagent runs. My 24.2B is probably low. The real cost was likely higher.


What broke technically

Prompt cache keys are computed over the exact byte sequence of the tool schema array. If the resumed session sends a different array from the original, the key changes, the first request misses, and the full payload gets billed at cache write rates again.

Three mechanisms appear to have contributed to the mismatch.

When tool search is active, Claude Code filters deferred MCP tools down to those already discovered through tool_reference blocks in message history. In a live session that set grows gradually. On --resume, all prior references load at once — so turn 1 of the resumed session can send a larger array than turn 1 of the original did. Different bytes, different key, full miss.

MCP servers also reconnect in parallel during resume. If one hasn't finished before the first API call, its tools are absent from the array. Same result: different bytes, different cache key.

And if agent schemas get assembled before all custom agents finish re-registering, the reconstructed array differs again.

After the source code leaked, a community reverse engineer identified a more specific mechanism. There's a function called db8 in cli.js that filters what gets saved to session JSONL files under ~/.claude/projects/. For non-Anthropic users, it strips attachment-type messages. Sounds harmless until you notice that deferred_tools_delta and mcp_instructions_delta are attachment-type — they're the records tracking which tools have already been announced to the model. Strip them out, resume the session, and Claude Code reconstructs tool state from scratch. The list comes back different. Cache miss, every time.


Why it lasted 27 days

The v2.1.85 crash was fixed fast because it was visible. Error message, report, patch. The cache miss was the opposite — sessions still worked, nothing crashed, costs crept up. Rate limits just started hitting sooner.

What makes this worse: v2.1.86 shipped a deliberate cache hit improvement for Bedrock, Vertex, and Foundry users by removing dynamic content from tool descriptions. Anthropic was actively fixing cache efficiency on cloud provider surfaces while a CLI regression from nine versions earlier kept shipping.

Nobody outside Anthropic could diagnose the CLI issue in detail until the source code leaked. Once it did, the community found db8 and had a patch with a test suite out within hours.

The April 1 timeline is genuinely messy:

DateEvent
March 5v2.1.69 ships, silently breaks --resume prompt caching
Mid-Marchv2.1.85 introduces a second --resume regression: session crashes
Mid-Marchv2.1.86 fixes the crash, ships cache improvements for cloud providers; CLI regression continues
March 31Source code leaks; community identifies db8 within hours
April 1, US morningAnthropic ships v2.1.89 with multiple cache and resume fixes
April 1, overlapCommunity patch cc-cache-fix ships with test_cache.py
April 1, US evening / EU next morningAnthropic ships v2.1.90, names the --resume cache-miss fix explicitly

Anthropic was already working through a backlog of cache bugs before v2.1.90 landed. Whether the leak accelerated the final fix or merely overlapped with it is impossible to say from the outside.


What this says about trust

The community patch made three changes to cli.js:

  1. Preserve deferred_tools_delta and mcp_instructions_delta in session JSONL so resume can reconstruct the cache prefix correctly
  2. Ignore injected meta messages in the first-message hash so the cache key stays stable across turns
  3. Force a one-hour cache TTL instead of falling back to five minutes via feature flag

That third one doesn't appear anywhere in the official changelog. Whether TTL differences are a product decision or a separate outstanding bug — unclear. Either way, the community patch provides something the official release does not: a mechanism users can inspect and test.

This is the bigger issue.

Anthropic's changelog says the bug was fixed. The community analysis names the function, names the stripped message types, explains why stripping them causes the miss, and ships a test that measures cache read ratios so you can verify it yourself. The official patch is a black box. You're asked to trust it, not given a way to check it.

That's a trust problem.

"We fixed it" isn't enough when the failure mode is invisible and the cost is real. What should exist is a reproducible verification path: a published test suite, open-sourced caching logic, or both.

This probably should have been caught outside the product entirely. If the first request of a resumed session suddenly costs 12× more than yesterday, that should trip a cost gate immediately. That is not a Claude Code feature problem. It is an observability gap.


After upgrading

Run --resume on a session with MCP servers and watch the first request. It should be noticeably faster.

Compare daily token costs before and after if you track spend — this regression was consistent enough that the improvement should show up clearly.

Still seeing slowdown in long sessions past 200 turns? Worth filing. Both v2.1.89 and v2.1.90 included several fixes in this area, so persistent degradation means there's another code path left.


Petr Kindlmann — breakit.dev · kindlm.com