Hidden Token Costs in Claude Code: How to Audit Your Cache Efficiency
You might be burning through your Claude Code usage faster than you think. After digging into Claude Code's local stats file, I traced a major silent token sink that most users don't know about.
Claude Code stores per-project session stats in ~/.claude.json.tmp.*. This file contains token breakdowns including cache hit/miss ratios that reveal how efficiently you're using your subscription.
By cross-referencing these stats with session transcripts, I found the culprit behind anomalous sessions with poor cache performance.
Claude Code's auto-memory system writes to ~/.claude/projects/<project>/memory/MEMORY.md, which gets embedded in your prompt. The problem:
- Any session that writes a memory invalidates the cache prefix for all other active sessions on that project
- You don't control when Claude decides to write a memory — it happens silently mid-conversation
- Power users accumulate large memory files — I had 99 files across 16 projects (~94KB total)
- Every turn in every session pays for this content whether you use it or not
One session showed a cache invalidation event during a 43-minute idle gap. During that gap, a parallel Claude Code session wrote new entries to MEMORY.md. When the first session resumed, its cache prefix no longer matched — every subsequent turn was a cold read.
This is especially brutal if you run multiple terminals. One session's innocent memory write silently taxes every other active session.
ls ~/.claude.json.tmp.*cat ~/.claude.json.tmp.* | python3 -c "
import json, sys
data = json.load(sys.stdin)
for path, info in data.get('projects', {}).items():
reads = info.get('lastTotalCacheReadInputTokens', 0)
creates = info.get('lastTotalCacheCreationInputTokens', 0)
raw = info.get('lastTotalInputTokens', 0)
total = reads + creates + raw
if total == 0: continue
hit_rate = reads / total * 100
cost = info.get('lastCost', 0)
print(f'{hit_rate:5.1f}% cache hit | \${cost:7.2f} | {raw:>10,} raw input | {path}')
for model, usage in info.get('lastModelUsage', {}).items():
m_reads = usage.get('cacheReadInputTokens', 0)
m_raw = usage.get('inputTokens', 0)
m_creates = usage.get('cacheCreationInputTokens', 0)
m_total = m_reads + m_raw + m_creates
m_rate = m_reads / m_total * 100 if m_total else 0
print(f' {m_rate:5.1f}% {model}: {m_raw:>10,} raw, {m_reads:>12,} cached')
" 2>/dev/null | sort -t'|' -k1 -nRed flags:
- Cache hit rate below 80% on Opus
- High
cacheCreationInputTokenswith lowcacheReadInputTokens— cache is being rebuilt, not reused - Sessions with high cost but low output — something went wrong
Healthy session:
- 95%+ cache hit rate on Opus
- Cache reads >> cache creates
Match lastSessionId from the stats to the transcript at:
~/.claude/projects/<mangled-project-path>/<session-id>.jsonl
Add to ~/.claude/CLAUDE.md (global, applies to all projects):
## Auto-Memory Disabled
Do not use auto-memory. Do not create or write to memory files.Then clean up existing memories:
rm -rf ~/.claude/projects/*/memory/Instead of one long session that accumulates context and risks cache invalidation:
- Break work into discrete tasks
- Execute each in a focused session or subagent
- Each gets a clean, stable cache prefix
The stats file contains a feature flag: "tengu_tool_search_unsupported_models": ["haiku"]. Tool search is the mechanism that defers loading MCP tool schemas until they're actually needed — instead of stuffing every tool definition into the prompt upfront.
Haiku doesn't get this optimization. If you have MCP servers with many tools registered at user scope, every Haiku subagent pays for the full tool schemas in its context. Scope tool-heavy MCP servers to specific projects if only those projects need them.
These are less impactful than auto-memory but worth knowing:
- Switching models mid-session (
/model) — different cache namespace - Toggling thinking mode — may change prompt shape
- Resuming a session after 1+ hours — cache TTL expires
- Editing settings/MCP configs mid-session — anything loaded into the prompt prefix (settings.json, MCP server registrations, CLAUDE.md) will invalidate the cache if changed while a session is active
/clear— same as starting a new session (cache warms up from scratch). Not inherently bad, just don't do it unnecessarily early in a session
Check ~/.claude.json.tmp.* for your cache hit rates. If they're bad, the most likely cause is auto-memory silently rewriting your prompt prefix — especially if you run parallel sessions. Disable it, delete the memory files, and your cache efficiency should improve immediately.