Developer ToolsFeatured7 min read•April 2, 2026

Claude Code v2.1.90: 27 days, 22 versions, one expensive bug

Since v2.1.69, --resume triggered a full prompt-cache miss for anyone running MCP servers or deferred tools. 27 days, 22 versions, ~$285K in estimated wasted API spend. Here's what broke, why it lasted, and what it says about trust.

Claude CodePrompt CachingMCPAI ToolingDebugging

I run Claude Code with 8 MCP servers and about 110 deferred tools per session. That puts me squarely in the population this regression hit hardest, with 24.2 billion tokens of usage data to spot it.

The bug

Since v2.1.69, --resume triggered a full prompt-cache miss on the first request for sessions using deferred tools, MCP servers, or custom agents. The git record is clear: commit 9582ad4 shipped March 5. v2.1.90 shipped April 1. 27 days. 22 versions.

It wasn't even --resume's only bad month. Around v2.1.85, a separate regression caused older sessions to crash with:

tool_use ids found without tool_result blocks

That was fixed in v2.1.86. By mid-March, --resume was broken in two different ways simultaneously: one visible crash, one silent token drain. The crash got attention. The cache miss didn't.

What it cost

Claude Code keeps a local token ledger at ~/.claude/stats-cache.json. Mine:

Metric	Value
Total sessions	414
Sessions per active day	10.9
Cache read tokens	23,467,319,874
Cache write tokens	670,339,241
Cache hit ratio	97.2%
Total token throughput	~24.2 billion

A 97.2% cache hit ratio is what healthy looks like — system prompt, tool schemas, conversation history served from cache on nearly every turn.

With 110 deferred tools across 8 MCP servers, each resume during the regression re-billed roughly 49,000 tokens at cache write rates instead of read rates:

	Cache hit	Cache miss
49K tokens at read rate ($0.30/MTok)	$0.0147	—
49K tokens at write rate ($3.75/MTok)	—	$0.1837
Extra cost per resume		+$0.169

11.5× more expensive. Same first request.

With 10.9 sessions per active day over 27 days, I had around 258 affected resumes:


Regression window	March 5 to April 1
Affected sessions	~258
Extra tokens burned	~12,628,881
Equivalent API cost	~$43.56
Rate limit burn	~11× faster

For Max plan users, there is no direct dollar cost. The cost is rate limits. A cache miss burns hourly budget dramatically faster than a hit. That likely explains many of the "why am I hitting my usage limit so fast?" posts through March.

At scale

My assumptions, upfront: 50,000–200,000 active Claude Code users, 15–25% with MCP servers, about 2.5 affected sessions per user per day, 27 days confirmed via git.

Scenario	Users	Sessions	Extra tokens	API cost
Conservative	7,500	506,250	24.8B	$85,600
Midpoint	25,000	1,687,500	82.7B	$285,200
High	50,000	3,375,000	165.4B	$570,400

Somewhere between $85K and $570K in wasted API spend. The midpoint estimate is around $285K.

One caveat: v2.1.89 also fixed a /stats bug that undercounted tokens from subagent runs. My 24.2B is probably low. The real cost was likely higher.

What broke technically

Prompt cache keys are computed over the exact byte sequence of the tool schema array. If the resumed session sends a different array from the original, the key changes, the first request misses, and the full payload gets billed at cache write rates again.

Three mechanisms appear to have contributed to the mismatch.

When tool search is active, Claude Code filters deferred MCP tools down to those already discovered through tool_reference blocks in message history. In a live session that set grows gradually. On --resume, all prior references load at once — so turn 1 of the resumed session can send a larger array than turn 1 of the original did. Different bytes, different key, full miss.

MCP servers also reconnect in parallel during resume. If one hasn't finished before the first API call, its tools are absent from the array. Same result: different bytes, different cache key.

And if agent schemas get assembled before all custom agents finish re-registering, the reconstructed array differs again.

After the source code leaked, a community reverse engineer identified a more specific mechanism. There's a function called db8 in cli.js that filters what gets saved to session JSONL files under ~/.claude/projects/. For non-Anthropic users, it strips attachment-type messages. Sounds harmless until you notice that deferred_tools_delta and mcp_instructions_delta are attachment-type — they're the records tracking which tools have already been announced to the model. Strip them out, resume the session, and Claude Code reconstructs tool state from scratch. The list comes back different. Cache miss, every time.

Why it lasted 27 days

The v2.1.85 crash was fixed fast because it was visible. Error message, report, patch. The cache miss was the opposite — sessions still worked, nothing crashed, costs crept up. Rate limits just started hitting sooner.

What makes this worse: v2.1.86 shipped a deliberate cache hit improvement for Bedrock, Vertex, and Foundry users by removing dynamic content from tool descriptions. Anthropic was actively fixing cache efficiency on cloud provider surfaces while a CLI regression from nine versions earlier kept shipping.

Nobody outside Anthropic could diagnose the CLI issue in detail until the source code leaked. Once it did, the community found db8 and had a patch with a test suite out within hours.

The April 1 timeline is genuinely messy:

Date	Event
March 5	v2.1.69 ships, silently breaks `--resume` prompt caching
Mid-March	v2.1.85 introduces a second `--resume` regression: session crashes
Mid-March	v2.1.86 fixes the crash, ships cache improvements for cloud providers; CLI regression continues
March 31	Source code leaks; community identifies `db8` within hours
April 1, US morning	Anthropic ships v2.1.89 with multiple cache and resume fixes
April 1, overlap	Community patch `cc-cache-fix` ships with `test_cache.py`
April 1, US evening / EU next morning	Anthropic ships v2.1.90, names the `--resume` cache-miss fix explicitly

Anthropic was already working through a backlog of cache bugs before v2.1.90 landed. Whether the leak accelerated the final fix or merely overlapped with it is impossible to say from the outside.

What this says about trust

The community patch made three changes to cli.js:

Preserve deferred_tools_delta and mcp_instructions_delta in session JSONL so resume can reconstruct the cache prefix correctly
Ignore injected meta messages in the first-message hash so the cache key stays stable across turns
Force a one-hour cache TTL instead of falling back to five minutes via feature flag

That third one doesn't appear anywhere in the official changelog. Whether TTL differences are a product decision or a separate outstanding bug — unclear. Either way, the community patch provides something the official release does not: a mechanism users can inspect and test.

This is the bigger issue.

Anthropic's changelog says the bug was fixed. The community analysis names the function, names the stripped message types, explains why stripping them causes the miss, and ships a test that measures cache read ratios so you can verify it yourself. The official patch is a black box. You're asked to trust it, not given a way to check it.

That's a trust problem.

"We fixed it" isn't enough when the failure mode is invisible and the cost is real. What should exist is a reproducible verification path: a published test suite, open-sourced caching logic, or both.

This probably should have been caught outside the product entirely. If the first request of a resumed session suddenly costs 12× more than yesterday, that should trip a cost gate immediately. That is not a Claude Code feature problem. It is an observability gap.

After upgrading

Run --resume on a session with MCP servers and watch the first request. It should be noticeably faster.

Compare daily token costs before and after if you track spend — this regression was consistent enough that the improvement should show up clearly.

Still seeing slowdown in long sessions past 200 turns? Worth filing. Both v2.1.89 and v2.1.90 included several fixes in this area, so persistent degradation means there's another code path left.

Petr Kindlmann — breakit.dev · kindlm.com