Where Claude Code Tokens Actually Go

Measure the waste before you optimize it

If your Claude Code API bill feels high, the cause is rarely "too many turns." It's a handful of specific, measurable patterns. Each one is visible in the session transcripts Claude Code already writes to ~/.claude/projects/ — you just have to add up the usage the API reports per turn. Here are the big three.

1. Re-reading the same files

The single most common waste pattern. The agent reads a large reference file, the conversation moves on, the file falls out of working context, and it reads the entire file again later. A 2,000-line file read five times is four needless full-file payloads.

Two fixes. For files the agent visits repeatedly, put a short summary plus key line numbers in your CLAUDE.md so it stops re-discovering them. For one-off lookups in big files, prefer a targeted grep -n followed by reading just the matching range, instead of reading the whole thing.

2. Oversized tool output

Tool results are tokens too. A cat of a 5,000-line log, an un-truncated npm test run, a full git diff of a generated lockfile — each lands in context at full size. Worse, one giant result can evict your prompt cache (see below), so you pay twice.

# instead of dumping everything:
cat huge.log                 # thousands of tokens

# scope it:
rg "ERROR|WARN" huge.log | head -50
npm test --silent 2>&1 | tail -30

3. Cache misses

Anthropic's prompt cache makes the repeated prefix of a conversation cheap — but only within a ~5-minute window, and only if the prefix is stable. Long idle gaps between turns let the cache expire. Huge tool results churn the context and evict the cached prefix. A healthy session reads 70%+ of its input tokens from cache; if you're well below that, batch your interactions and trim the output that's causing churn.

How to measure your own sessions

Every assistant turn in the transcript JSONL carries a usage object with input_tokens, output_tokens, cache_read_input_tokens, and cache_creation_input_tokens. Sum them across sessions, attribute tool-result size by tool, and count how many times each file was read. The waste becomes obvious fast — in real sessions it's common to find 15–30% of input tokens going to re-reads and oversized results alone.

Don't want to write the analyzer?
CC Powerpack Pro includes token-audit — it scans your transcripts, shows exactly where tokens went (per tool, per re-read file, cache hit rate), and emits concrete config fixes. See CC Powerpack Pro → or start with the free hooks.

← Back to CC Powerpack