I Was Burning Through Claude Tokens Without Knowing It
577 million tokens in four weeks — and 95% of it was cache reads. Here's what I found when I finally looked at the data, and what I changed.
I kept hitting my Claude token limits way too fast. I'm on a Teams plan and couldn't figure out where all my tokens were going. So I dug into the session logs — and what I found changed how I work with Claude Code completely. The culprit wasn't my prompts. It was the same context being re-sent, silently, thousands of times over.
What the Numbers Actually Showed
Claude Code stores session logs locally as .jsonl files in ~/.claude/projects/. I wrote a script to parse them. The numbers were eye-opening.
Here's how the full breakdown looked:
| Token Type | Usage | Share |
|---|---|---|
| Cache Read | 550M | 95.4% |
| Cache Creation | 24M | 4.1% |
| Output | 3M | 0.5% |
| Input | 36K | <0.01% |
One single session with 2,730 messages consumed 383 million tokens — two-thirds of my entire four-week usage. I had been running long, sprawling coding sessions and had no idea what that was costing me.
"Before this exercise, I had no visibility into where my tokens were going. Turns out I was paying with context, not with prompts."
The Three Causes
Once I understood the data, the causes were obvious — but I'd never thought to question any of them before.
- Cache reads grow exponentially. Every message in a session re-sends the full conversation history as context. At message 1,000, you're sending roughly 500K tokens of history each time — served from cache, but still counted against your usage limit. Long sessions compound fast.
- Subagents add independent overhead. Each spawned subagent gets its own context window. I had 17 subagents across my sessions, each accumulating cache overhead independently. What looks like one task is actually many separate context streams.
- Connected MCP tools inflate every turn. Every tool schema — browser, calendar, filesystem, preview tools — gets included in the system prompt and re-cached on every single turn. The more MCP tools connected, the bigger the base cost of each message.
Token limits on Claude's Teams plan are not primarily constrained by how much you write or how much Claude responds. They're constrained by how long you keep a session open and how much context accumulates in it.
Five Changes That Actually Helped
These aren't theoretical optimizations — I tested each one and measured the difference in session token consumption.
- Keep sessions shorter. Break large tasks into smaller, focused sessions. That 2,730-message marathon was the single biggest offender. Now I treat each session as scoped to one task, then start fresh.
- Use
/compactregularly. This compresses conversation context into a summary. A 500K-token context shrinks to roughly 50K. The one-time creation cost pays for itself within 2–3 turns at long session lengths. - Disconnect unused MCP tools. Fewer connected tools means a smaller system prompt, which means fewer tokens per turn — on every single message, for the entire session. Disconnecting even two or three tools adds up.
- Use lighter models for exploration. Haiku-class subagents are well suited to search and lookup tasks. Reserve Opus-class reasoning for the decisions that actually need it. The right model for the job is rarely the heaviest one.
- Fewer subagents for simple lookups. Use
Readorgrepdirectly instead of spawning an agent for a file read or symbol search. Subagents are powerful, but each one opens an independent context stream with its own overhead.
I Built a Skill So You Can Check Your Own Numbers
After doing this analysis manually once, I automated it as a /token-usage skill for Claude Code. It parses your local session logs and gives you the same breakdown — on demand, filtered by any time window.
/token-usage
Parses your local ~/.claude/projects/ session logs and reports token consumption by type, session, and time period. No external dependencies — just Python 3.6+.
- /token-usage
- /token-usage last 7 days
- /token-usage last 30 days
- /token-usage 2026-05-01 to 2026-05-16
- /token-usage 2026-05-13
Works on macOS, Linux, and Windows. Does not work on phones, tablets, or iPads — Claude Code session logs are only stored on desktop/laptop machines.
View on GitHub →Setup (2 minutes)
Clone or download the skill, then:
# Copy the skill into place mkdir -p ~/.claude/skills cp -r token-usage ~/.claude/skills/ # Create the slash command mkdir -p ~/.claude/commands
Create ~/.claude/commands/token-usage.md with:
Run the token usage report script to show Claude Code token consumption details.
Execute this command, passing any arguments the user provided after /token-usage:
python3 ~/.claude/skills/token-usage/scripts/token_report.py $ARGUMENTS
Present the full output to the user. The script accepts these period formats:
- No argument: all time
- last 7 days, last 30 days, last 2 weeks, last 3 months
- 2026-05-01 to 2026-05-16 (date range)
- 2026-05-13 (single day)
Start a new Claude Code session and run /token-usage. That's it.
The Takeaway
If you're running out of Claude tokens faster than you expect, the problem almost certainly isn't your prompts — it's session length and cache accumulation. Before blaming your usage patterns, check the actual data. A quick audit of your session logs will tell you immediately where the tokens are going. For most people, the fix is as simple as keeping sessions shorter and running /compact before context gets unwieldy. The numbers in your logs don't lie.