Learning Claude

I Was Burning Through Claude Tokens Without Knowing It

577 million tokens in four weeks — and 95% of it was cache reads. Here's what I found when I finally looked at the data, and what I changed.

Ravi Ahir May 2026 5 min read For: Claude Code users · Teams plan

I kept hitting my Claude token limits way too fast. I'm on a Teams plan and couldn't figure out where all my tokens were going. So I dug into the session logs — and what I found changed how I work with Claude Code completely. The culprit wasn't my prompts. It was the same context being re-sent, silently, thousands of times over.

01 — The Discovery

What the Numbers Actually Showed

Claude Code stores session logs locally as .jsonl files in ~/.claude/projects/. I wrote a script to parse them. The numbers were eye-opening.

577M

Total tokens in 4 weeks across 30 sessions

95%

Were cache reads — not prompts, not responses

383M

From a single 2,730-message session alone

Here's how the full breakdown looked:

Token Type	Usage	Share
Cache Read	550M	95.4%
Cache Creation	24M	4.1%
Output	3M	0.5%
Input	36K	<0.01%

One single session with 2,730 messages consumed 383 million tokens — two-thirds of my entire four-week usage. I had been running long, sprawling coding sessions and had no idea what that was costing me.

"Before this exercise, I had no visibility into where my tokens were going. Turns out I was paying with context, not with prompts."

02 — Why This Happens

The Three Causes

Once I understood the data, the causes were obvious — but I'd never thought to question any of them before.

Cache reads grow exponentially. Every message in a session re-sends the full conversation history as context. At message 1,000, you're sending roughly 500K tokens of history each time — served from cache, but still counted against your usage limit. Long sessions compound fast.
Subagents add independent overhead. Each spawned subagent gets its own context window. I had 17 subagents across my sessions, each accumulating cache overhead independently. What looks like one task is actually many separate context streams.
Connected MCP tools inflate every turn. Every tool schema — browser, calendar, filesystem, preview tools — gets included in the system prompt and re-cached on every single turn. The more MCP tools connected, the bigger the base cost of each message.

The Core Insight

Token limits on Claude's Teams plan are not primarily constrained by how much you write or how much Claude responds. They're constrained by how long you keep a session open and how much context accumulates in it.

03 — The Fix

Five Changes That Actually Helped

These aren't theoretical optimizations — I tested each one and measured the difference in session token consumption.

Keep sessions shorter. Break large tasks into smaller, focused sessions. That 2,730-message marathon was the single biggest offender. Now I treat each session as scoped to one task, then start fresh.
Use /compact regularly. This compresses conversation context into a summary. A 500K-token context shrinks to roughly 50K. The one-time creation cost pays for itself within 2–3 turns at long session lengths.
Disconnect unused MCP tools. Fewer connected tools means a smaller system prompt, which means fewer tokens per turn — on every single message, for the entire session. Disconnecting even two or three tools adds up.
Use lighter models for exploration. Haiku-class subagents are well suited to search and lookup tasks. Reserve Opus-class reasoning for the decisions that actually need it. The right model for the job is rarely the heaviest one.
Fewer subagents for simple lookups. Use Read or grep directly instead of spawning an agent for a file read or symbol search. Subagents are powerful, but each one opens an independent context stream with its own overhead.

04 — The Tool

I Built a Skill So You Can Check Your Own Numbers

After doing this analysis manually once, I automated it as a /token-usage skill for Claude Code. It parses your local session logs and gives you the same breakdown — on demand, filtered by any time window.

Claude Code Skill

/token-usage

Parses your local ~/.claude/projects/ session logs and reports token consumption by type, session, and time period. No external dependencies — just Python 3.6+.

/token-usage
/token-usage last 7 days
/token-usage last 30 days
/token-usage 2026-05-01 to 2026-05-16
/token-usage 2026-05-13

Works on macOS, Linux, and Windows. Does not work on phones, tablets, or iPads — Claude Code session logs are only stored on desktop/laptop machines.

View on GitHub →

Setup (2 minutes)

Clone or download the skill, then:

# Copy the skill into place
mkdir -p ~/.claude/skills
cp -r token-usage ~/.claude/skills/

# Create the slash command
mkdir -p ~/.claude/commands

Create ~/.claude/commands/token-usage.md with:

Run the token usage report script to show Claude Code token consumption details.

Execute this command, passing any arguments the user provided after /token-usage:

python3 ~/.claude/skills/token-usage/scripts/token_report.py $ARGUMENTS

Present the full output to the user. The script accepts these period formats:
- No argument: all time
- last 7 days, last 30 days, last 2 weeks, last 3 months
- 2026-05-01 to 2026-05-16 (date range)
- 2026-05-13 (single day)

Start a new Claude Code session and run /token-usage. That's it.

The Takeaway

If you're running out of Claude tokens faster than you expect, the problem almost certainly isn't your prompts — it's session length and cache accumulation. Before blaming your usage patterns, check the actual data. A quick audit of your session logs will tell you immediately where the tokens are going. For most people, the fix is as simple as keeping sessions shorter and running /compact before context gets unwieldy. The numbers in your logs don't lie.