Token Management Guide for Those Struggling with Bloated CLAUDE.md

“Keep CLAUDE.md concise. Sixty lines is ideal.”

That’s what theory posts say. But once you move a real project forward, it expands. You add guardrails. You note error patterns. Before you know it, it’s over 300 lines.

Even if people tell you to keep it short, necessary information is still necessary.

In the end, obsessing over the length of CLAUDE.md alone misses the point. The real issue is token management.

We may have a 200K context window, but one third is eaten by the system prompt. As the conversation goes on, summarization kicks in and important details get dropped.

This article organizes Claude Code features around one axis: how to keep tokens under control. Keeping CLAUDE.md small is only one of several means to that end.

Premise: Why agents fail on long‑running tasks

LLMs are stateless. Every session starts with no memory.

This is the root of many problems and leads to typical failure modes:

Trying to build everything at once → context runs out and work ends half‑done
Claiming “done” mid‑way → the agent settles for partial success
Getting lost across sessions → it forgets last time and repeats itself
“Tests passed,” but only with mocks → nothing runs in a production‑like setup

All of these come from context limits and failures to carry state forward.

Token management: the foundation

First, understand how context actually gets consumed:

Source	Ballpark
Claude Code system prompt	About 50 directives (≈ one‑third of capacity)
CLAUDE.md	Consumes ~20% in the initial state
Tool definitions	50 tools ≈ 10–20K tokens
Conversation history	Accumulates and triggers compression

Even with 200K, you effectively use roughly half. As the chat grows, compression runs more often.

Four‑layer strategy

Control tokens across four layers:

Layer 1: Static reduction (reduce up front)

Keep CLAUDE.md concise (target 60 lines; realistically keep it under 300)
Use pointers (reference file:line rather than embedding code)
The 30% rule: include only information used by 30%+ of engineers (in solo projects you use everything yourself, which tends to bloat it)

Layer 2: Lazy loading (only when needed)

Skills: consume no context until invoked
Tool Search: use defer_loading to delay tool definitions
Put detailed docs in separate files and reference them

Layer 3: Externalize state (escape the context window)

progress.txt: write progress to a file
git log: let git hold history
features.json: manage completion state externally

There are two good times to write things out:

Right before compression is likely: write state out first.
When a task won’t finish in one go: write progress even mid‑stream.

It’s fine to be moderately detailed. Think “from everything we know, what will we definitely need to look up in the next session?”

Layer 4: Session management (reset before overflow)

/clear + /catchup: resume from a clean slate
Avoid /compact: it’s opaque and unstable
Document & clear: for complex tasks, save to a .md and then restart

Preparation phase: make an environment agents can’t get lost in

To succeed with long tasks, prepare the following files:

File	Role	Notes
`CLAUDE.md`	Guardrails	Be concise; negatives with alternatives (Never X, prefer Y)
`features.json`	Feature list	Make completion criteria explicit; `passes: true/false`
`init.sh`	Environment bootstrap	Include Docker startup; begin every session from the same state
`progress.txt`	Progress log	Carry‑over between sessions

Four rules for writing CLAUDE.md

Guardrails first: write about where Claude tends to err, not a user manual
Avoid @file notation: embedding causes context bloat
Don’t stop at negatives: say “Never X, prefer Y” and offer an alternative
Use as enforced functions: wrap complex CLI commands in bash so CLAUDE.md stays concise

Example features.json

{
  "features": [
    {
      "category": "functional",
      "description": "ユーザー登録フォームの実装",
      "steps": ["フォームUI作成", "バリデーション追加", "API連携"],
      "passes": false
    },
    {
      "category": "functional",
      "description": "ログイン機能の実装",
      "steps": ["認証ロジック", "セッション管理", "リダイレクト"],
      "passes": false
    }
  ]
}

Execution phase: two‑stage harness design

A pattern recommended by Anthropic:

初期化エージェント（最初のセッション）
  ├── init.sh 作成
  ├── features.json 作成
  ├── progress.txt 初期化
  └── 初期gitコミット
           ↓
コーディングエージェント（後続セッション）
  ├── 1機能ずつ実装
  ├── 各機能完了時にgitコミット
  ├── 進捗ファイル更新
  └── テスト検証してから完了マーク

Standard procedure at session start

Each session, have the agent reacquire state via:

Check working directory with pwd
Review recent work via git log + progress.txt
Choose the next goal from features.json
Start the dev server with init.sh

This reliably carries “where we left off last time” into each session.

Parallel execution and result integration

Running sub‑agents in parallel can shorten wall‑clock time, but it’s confusing unless you understand how to merge results.

Master–Clone architecture

メイン（フルコンテキスト）
    │
    ├─→ Clone A:「○○だけ調べて結果を3行で返せ」
    ├─→ Clone B:「△△だけ調べて結果を3行で返せ」
    └─→ Clone C:「□□だけ調べて結果を3行で返せ」

Key points:

Give sub‑agents only the minimum task‑specific instructions
Have them return compressed results (save tokens)
Let the main agent integrate (it holds full context)

Pitfalls of pre‑defined specialist sub‑agents

Defining specialized sub‑agents up front invites problems:

Context isolation: specialists lose the big picture and results don’t fit together
Rigid workflows: when humans pre‑decide delegation, agents can’t self‑optimize

Recommendation: the “Master–Clone” approach. Give the main agent full context, and have it spawn clones with the same capabilities as needed.

Choosing how to load knowledge

There are multiple ways to give knowledge to an agent, each with different token costs:

Method	Token cost	Use case
CLAUDE.md	Paid every session	Global guardrails across sessions
Skills	Only when invoked	Task‑specific knowledge
MCP	Tool definitions always cost	Only when auth/networked actions are required
Separate files	Only when read	Detailed documentation
Tool Search	Only what’s needed at search time	Managing lots of tools (50+)

A new role for MCP

If you mirror APIs in MCP (1 API = 1 tool), the tool count explodes and pressures the context window.

Use MCP as a “gateway” instead:

❌ Old: 1 API = 1 MCP tool (tool count explosion)

✅ Recommended: consolidate into three high‑level tools
   - download_raw_data(filters...)
   - take_sensitive_gated_action(args...)
   - execute_code_in_environment_with_state(code...)

If you just need to pass knowledge, Skills or separate files are more token‑efficient than MCP.

Skill gotchas

Blocked by default; you must explicitly allow them
The 25% activation problem: when you spell out activation conditions in CLAUDE.md, it triggers 100%

<!-- Add to CLAUDE.md -->
When the user asks “What’s the tech stack?”,
always use the tech-stack skill.

Depth of validation

Even when an agent says “done,” check what level of validation it actually performed.

Front‑end validation levels

Level 1: Screenshot check (looks OK)
Level 2: + No console errors
Level 3: + No network errors
Level 4: + Meets performance thresholds
Level 5: + Passes accessibility audit

Playwright MCP is currently stuck at Level 1 (screenshots only). It should also capture DevTools signals:

// Capture console logs
page.on('console', msg => console.log(msg.text()));

// Capture JS errors
page.on('pageerror', error => console.log(error.message));

// Capture network failures
page.on('requestfailed', request =>
  console.log(request.url(), request.failure().errorText)
);

Back‑end validation levels

Level 1: Unit tests (with mocks)
Level 2: + Real DB connection tests
Level 3: + Integration tests in a production‑like environment

When you hear “tests passed,” confirm the following:

Maybe they only passed with mocks/stubs
Passing on SQLite can still fail on MySQL/PostgreSQL
Ensure the repository layer isn’t swapped for a test double

Strategy for test environments

Approach 1: Abstraction (e.g., ZTD)

Use CTEs to shadow table references and test SQL without touching real tables.

Lightweight and parallelizable
Doesn’t cover stored procedures/triggers
Good for simple CRUD

Approach 2: Reproduce the environment (Docker)

Prepare a production‑like environment in containers.

Easy to blow away with docker-compose down
Test against the same DB dialect as production
Safe even when you grant the agent privileges

Docker is recommended for agent development: you can throw failures away, and you test in an environment that matches production.

Design quality: remove the “AI‑ish” look

When you have Claude Code build front‑end UI, it tends to look “AI‑ish.”

Philosophy of the frontend-design SKILL

Have the agent check four contexts before coding:

Purpose: what problem are we solving?
Tone: aesthetic direction (brutally minimal / maximalist chaos / retro‑futuristic, etc.)
Constraints: technical constraints
Differentiation: what makes it UNFORGETTABLE?

“AI slop” to avoid

Generic fonts like Inter, Roboto, Arial
Purple gradients on white backgrounds
Predictable layouts

What matters is intentional choices. Bold or refined—either is fine if consistent.

Author’s approach

To date I’ve used the Atlassian Design System as a reference to get the minimum in place. Riding on an existing design system at least keeps things from breaking down.

I plan to try Claude’s frontend‑design SKILL next. If it forces “intentional choices,” it could raise the design bar.

Automation and brakes

Auto‑approving tools is convenient, but crank it up too much and things won’t stop.

How I use AI tools

For context, here’s how I split tools today:

Claude Code: main development—code generation, file ops, builds
Gemini: rubber‑ducking and design discussions; a second view
ChatGPT: casual chat; almost never for programming
Manus: used to use it, but working locally suits me better now

Keep that in mind for the “runaway” issue below.

Auto‑approval settings

// .claude/settings.json
{
  "permissions": {
    "allow": [
      "WebFetch(*)",
      "Bash(docker-compose:*)",
      "Skill(*)"
    ]
  }
}

The runaway problem

Getting asked “Allow?” every time is annoying, so you’ll want auto‑approval. But too much and it won’t stop.

Gemini pauses to ask “Are you sure?” along the way
Claude Code keeps executing quietly
Driven via API, it can run indefinitely—scary

Mitigations

Don’t auto‑approve destructive actions (rm -rf, DROP TABLE, etc.)
Use hooks to block specific operations
Check /context periodically to monitor token use
Set timeouts for long‑running tasks

Summary

To get Claude Code to build a service to a runnable state, you need to design token management as a whole—not just the way you write CLAUDE.md.

Tokens are finite: use the four‑layer strategy to cut consumption
Externalize state: hand off to progress.txt, features.json, and git
Validate deeply: not just screenshots—also DevTools and a real DB
Add brakes to automation: understand the risk of runaways

A concrete build example will be in a separate post.

References

Writing a good CLAUDE.md - HumanLayer
How I use every Claude Code feature - sshh.io
Effective harnesses for long-running agents - Anthropic official
Claude Code Skill and Sub-agent Strategy Guide - Zenn
Tool Search Tool - Anthropic official
frontend-design SKILL - Claude Code official
@rawsql-ts/pg-testkit - Zenn
sakura-lolipop-docker - Rental-server reproduction Docker