Token Management Guide for Those Struggling with Bloated CLAUDE.md
“Keep CLAUDE.md concise. Sixty lines is ideal.”
That’s what theory posts say. But once you move a real project forward, it expands. You add guardrails. You note error patterns. Before you know it, it’s over 300 lines.
Even if people tell you to keep it short, necessary information is still necessary.
In the end, obsessing over the length of CLAUDE.md alone misses the point. The real issue is token management.
We may have a 200K context window, but one third is eaten by the system prompt. As the conversation goes on, summarization kicks in and important details get dropped.
This article organizes Claude Code features around one axis: how to keep tokens under control. Keeping CLAUDE.md small is only one of several means to that end.
Premise: Why agents fail on long‑running tasks
LLMs are stateless. Every session starts with no memory.
This is the root of many problems and leads to typical failure modes:
- Trying to build everything at once → context runs out and work ends half‑done
- Claiming “done” mid‑way → the agent settles for partial success
- Getting lost across sessions → it forgets last time and repeats itself
- “Tests passed,” but only with mocks → nothing runs in a production‑like setup
All of these come from context limits and failures to carry state forward.
Token management: the foundation
First, understand how context actually gets consumed:
| Source | Ballpark |
|---|---|
| Claude Code system prompt | About 50 directives (≈ one‑third of capacity) |
| CLAUDE.md | Consumes ~20% in the initial state |
| Tool definitions | 50 tools ≈ 10–20K tokens |
| Conversation history | Accumulates and triggers compression |
Even with 200K, you effectively use roughly half. As the chat grows, compression runs more often.
Four‑layer strategy
Control tokens across four layers:
Layer 1: Static reduction (reduce up front)
- Keep CLAUDE.md concise (target 60 lines; realistically keep it under 300)
- Use pointers (reference
file:linerather than embedding code) - The 30% rule: include only information used by 30%+ of engineers (in solo projects you use everything yourself, which tends to bloat it)
Layer 2: Lazy loading (only when needed)
- Skills: consume no context until invoked
- Tool Search: use
defer_loadingto delay tool definitions - Put detailed docs in separate files and reference them
Layer 3: Externalize state (escape the context window)
progress.txt: write progress to a filegit log: let git hold historyfeatures.json: manage completion state externally
There are two good times to write things out:
- Right before compression is likely: write state out first.
- When a task won’t finish in one go: write progress even mid‑stream.
It’s fine to be moderately detailed. Think “from everything we know, what will we definitely need to look up in the next session?”
Layer 4: Session management (reset before overflow)
/clear + /catchup: resume from a clean slate- Avoid
/compact: it’s opaque and unstable - Document & clear: for complex tasks, save to a
.mdand then restart
Preparation phase: make an environment agents can’t get lost in
To succeed with long tasks, prepare the following files:
| File | Role | Notes |
|---|---|---|
CLAUDE.md | Guardrails | Be concise; negatives with alternatives (Never X, prefer Y) |
features.json | Feature list | Make completion criteria explicit; passes: true/false |
init.sh | Environment bootstrap | Include Docker startup; begin every session from the same state |
progress.txt | Progress log | Carry‑over between sessions |
Four rules for writing CLAUDE.md
- Guardrails first: write about where Claude tends to err, not a user manual
- Avoid @file notation: embedding causes context bloat
- Don’t stop at negatives: say “Never X, prefer Y” and offer an alternative
- Use as enforced functions: wrap complex CLI commands in bash so CLAUDE.md stays concise
Example features.json
{
"features": [
{
"category": "functional",
"description": "ユーザー登録フォームの実装",
"steps": ["フォームUI作成", "バリデーション追加", "API連携"],
"passes": false
},
{
"category": "functional",
"description": "ログイン機能の実装",
"steps": ["認証ロジック", "セッション管理", "リダイレクト"],
"passes": false
}
]
}
Execution phase: two‑stage harness design
A pattern recommended by Anthropic:
初期化エージェント(最初のセッション)
├── init.sh 作成
├── features.json 作成
├── progress.txt 初期化
└── 初期gitコミット
↓
コーディングエージェント(後続セッション)
├── 1機能ずつ実装
├── 各機能完了時にgitコミット
├── 進捗ファイル更新
└── テスト検証してから完了マーク
Standard procedure at session start
Each session, have the agent reacquire state via:
- Check working directory with
pwd - Review recent work via
git log+progress.txt - Choose the next goal from
features.json - Start the dev server with
init.sh
This reliably carries “where we left off last time” into each session.
Parallel execution and result integration
Running sub‑agents in parallel can shorten wall‑clock time, but it’s confusing unless you understand how to merge results.
Master–Clone architecture
メイン(フルコンテキスト)
│
├─→ Clone A:「○○だけ調べて結果を3行で返せ」
├─→ Clone B:「△△だけ調べて結果を3行で返せ」
└─→ Clone C:「□□だけ調べて結果を3行で返せ」
Key points:
- Give sub‑agents only the minimum task‑specific instructions
- Have them return compressed results (save tokens)
- Let the main agent integrate (it holds full context)
Pitfalls of pre‑defined specialist sub‑agents
Defining specialized sub‑agents up front invites problems:
- Context isolation: specialists lose the big picture and results don’t fit together
- Rigid workflows: when humans pre‑decide delegation, agents can’t self‑optimize
Recommendation: the “Master–Clone” approach. Give the main agent full context, and have it spawn clones with the same capabilities as needed.
Choosing how to load knowledge
There are multiple ways to give knowledge to an agent, each with different token costs:
| Method | Token cost | Use case |
|---|---|---|
| CLAUDE.md | Paid every session | Global guardrails across sessions |
| Skills | Only when invoked | Task‑specific knowledge |
| MCP | Tool definitions always cost | Only when auth/networked actions are required |
| Separate files | Only when read | Detailed documentation |
| Tool Search | Only what’s needed at search time | Managing lots of tools (50+) |
A new role for MCP
If you mirror APIs in MCP (1 API = 1 tool), the tool count explodes and pressures the context window.
Use MCP as a “gateway” instead:
❌ Old: 1 API = 1 MCP tool (tool count explosion)
✅ Recommended: consolidate into three high‑level tools
- download_raw_data(filters...)
- take_sensitive_gated_action(args...)
- execute_code_in_environment_with_state(code...)
If you just need to pass knowledge, Skills or separate files are more token‑efficient than MCP.
Skill gotchas
- Blocked by default; you must explicitly allow them
- The 25% activation problem: when you spell out activation conditions in CLAUDE.md, it triggers 100%
<!-- Add to CLAUDE.md -->
When the user asks “What’s the tech stack?”,
always use the tech-stack skill.
Depth of validation
Even when an agent says “done,” check what level of validation it actually performed.
Front‑end validation levels
Level 1: Screenshot check (looks OK)
Level 2: + No console errors
Level 3: + No network errors
Level 4: + Meets performance thresholds
Level 5: + Passes accessibility audit
Playwright MCP is currently stuck at Level 1 (screenshots only). It should also capture DevTools signals:
// Capture console logs
page.on('console', msg => console.log(msg.text()));
// Capture JS errors
page.on('pageerror', error => console.log(error.message));
// Capture network failures
page.on('requestfailed', request =>
console.log(request.url(), request.failure().errorText)
);
Back‑end validation levels
Level 1: Unit tests (with mocks)
Level 2: + Real DB connection tests
Level 3: + Integration tests in a production‑like environment
When you hear “tests passed,” confirm the following:
- Maybe they only passed with mocks/stubs
- Passing on SQLite can still fail on MySQL/PostgreSQL
- Ensure the repository layer isn’t swapped for a test double
Strategy for test environments
Approach 1: Abstraction (e.g., ZTD)
Use CTEs to shadow table references and test SQL without touching real tables.
- Lightweight and parallelizable
- Doesn’t cover stored procedures/triggers
- Good for simple CRUD
Approach 2: Reproduce the environment (Docker)
Prepare a production‑like environment in containers.
- Easy to blow away with
docker-compose down - Test against the same DB dialect as production
- Safe even when you grant the agent privileges
Docker is recommended for agent development: you can throw failures away, and you test in an environment that matches production.
Design quality: remove the “AI‑ish” look
When you have Claude Code build front‑end UI, it tends to look “AI‑ish.”
Philosophy of the frontend-design SKILL
Have the agent check four contexts before coding:
- Purpose: what problem are we solving?
- Tone: aesthetic direction (brutally minimal / maximalist chaos / retro‑futuristic, etc.)
- Constraints: technical constraints
- Differentiation: what makes it UNFORGETTABLE?
“AI slop” to avoid
- Generic fonts like Inter, Roboto, Arial
- Purple gradients on white backgrounds
- Predictable layouts
What matters is intentional choices. Bold or refined—either is fine if consistent.
Author’s approach
To date I’ve used the Atlassian Design System as a reference to get the minimum in place. Riding on an existing design system at least keeps things from breaking down.
I plan to try Claude’s frontend‑design SKILL next. If it forces “intentional choices,” it could raise the design bar.
Automation and brakes
Auto‑approving tools is convenient, but crank it up too much and things won’t stop.
How I use AI tools
For context, here’s how I split tools today:
- Claude Code: main development—code generation, file ops, builds
- Gemini: rubber‑ducking and design discussions; a second view
- ChatGPT: casual chat; almost never for programming
- Manus: used to use it, but working locally suits me better now
Keep that in mind for the “runaway” issue below.
Auto‑approval settings
// .claude/settings.json
{
"permissions": {
"allow": [
"WebFetch(*)",
"Bash(docker-compose:*)",
"Skill(*)"
]
}
}
The runaway problem
Getting asked “Allow?” every time is annoying, so you’ll want auto‑approval. But too much and it won’t stop.
- Gemini pauses to ask “Are you sure?” along the way
- Claude Code keeps executing quietly
- Driven via API, it can run indefinitely—scary
Mitigations
- Don’t auto‑approve destructive actions (
rm -rf,DROP TABLE, etc.) - Use hooks to block specific operations
- Check
/contextperiodically to monitor token use - Set timeouts for long‑running tasks
Summary
To get Claude Code to build a service to a runnable state, you need to design token management as a whole—not just the way you write CLAUDE.md.
- Tokens are finite: use the four‑layer strategy to cut consumption
- Externalize state: hand off to progress.txt, features.json, and git
- Validate deeply: not just screenshots—also DevTools and a real DB
- Add brakes to automation: understand the risk of runaways
A concrete build example will be in a separate post.
References
- Writing a good CLAUDE.md - HumanLayer
- How I use every Claude Code feature - sshh.io
- Effective harnesses for long-running agents - Anthropic official
- Claude Code Skill and Sub-agent Strategy Guide - Zenn
- Tool Search Tool - Anthropic official
- frontend-design SKILL - Claude Code official
- @rawsql-ts/pg-testkit - Zenn
- sakura-lolipop-docker - Rental-server reproduction Docker