Claude Code context rot starts before the first prompt, not at 45 minutes

Fresh Claude Code sessions work well. For the first few dozen minutes, it reads files correctly, remembers earlier decisions, and edits with care. Around 45 minutes in, it starts re-reading files it already opened, circling back to discarded approaches, and producing longer replies with shallower edits.

Amit Baliyan’s DEV article “Claude Code Context Window Rot: Why Sessions Get Dumber (And How to Fix It)” frames this as context rot. What’s interesting is that it doesn’t stop at “run /compact when things get long.” It examines the quality of context entering the session before you even type. This overlaps with my earlier piece on adding working memory to Claude Code with CTX, but from the opposite direction—CTX selectively injects context afterward, while the original article is about trimming what gets mixed in from the start.

The 45-minute wall is not a timer

The original article mentions sessions degrading around the 45-minute mark. That doesn’t mean Claude Code has a quality timer set to 45 minutes. By that point, conversation history, Read results, grep output, bash output, diffs, errors, and retry instructions have piled up, and tokens irrelevant to the current task outnumber the useful ones.

Anthropic’s official Claude Code blog explains context rot in the context of 1M token windows. The context window holds everything: system prompt, conversation, tool calls, tool output, and files read. As it grows, attention dilutes, and stale information competes with the current task.

The Agent SDK docs put it in implementation terms. Context is not reset between turns within a session. Conversation history, tool input, and tool output accumulate, and when approaching the limit, automatic compaction replaces older history with summaries. Summaries help, but there’s no guarantee that early constraints and decisions survive intact.

“45 minutes” is not the cause—it’s roughly where the muddiness becomes visible. If you feed large logs repeatedly, it happens faster. If the work is short edits and small tests, sessions can hold past an hour.

Length alone degrades output

This isn’t just a Claude Code complaint. Chroma’s technical report “Context Rot: How Increasing Input Tokens Impacts LLM Performance” tested 18 models—GPT-4.1, Claude 4, Gemini 2.5, Qwen 3, and others—showing that performance destabilizes as input length grows, even when the extra tokens are semantically neutral. It goes beyond simple needle-in-a-haystack tests, examining breakdown with semantically similar distractors and structured haystacks.

A codebase is full of distractors. Functions with similar names, old implementations, modules with overlapping responsibilities, diffs from failed fixes, test logs. From the model’s perspective, they all look “plausible.”

Liu et al.’s Lost in the Middle is also relevant. Information at the beginning or end of the context is easier to retrieve; information buried in the middle gets lost. In long Claude Code sessions, early decisions like “don’t touch this file” or “use this design” get pushed into the middle of a mountain of tool output. The model hasn’t fully forgotten them, but their effective weight in the next generation drops.

CLAUDE.md and tool output are the first weights

What sets the original article apart from existing context management pieces is that it directly addresses pre-session context. Before the session begins, CLAUDE.md, MCP tool definitions, skill descriptions, auto-memory, and additional system prompts are already loaded. When these are bloated, the workspace for actual tasks shrinks before you type anything.

I wrote about this in a token management guide for bloated CLAUDE.md files, where the advice was to keep CLAUDE.md to conclusions only and move details to separate files. Back then, the main concern was fitting into 200K. Now with 1M, the same problem persists—more capacity doesn’t mean the model can uniformly ignore mixed-in noise.

What matters is the shape of the context, not just its size. The original article compares dumping in a full Notion spec, full source, full git log, and full Jira tickets versus a structured handoff of task scope, relevant methods, config, and constraints. The latter is smaller, but more importantly, each piece of information has a clear purpose.

Before asking Claude Code to “fix this feature,” listing the files to touch, discarded approaches, constraints to honor, and verification commands beats pasting the whole project background every time. If more context is needed, let it read files on demand. Unglamorous, but it pays off in the back half of long sessions.

Compact is not repair—it’s a checkpoint

/compact isn’t magic that fixes a degraded session. It’s closer to placing a checkpoint while things are still clean.

If you compact after quality has already dropped, the misunderstandings, failed approaches, and bloated explanations all enter the summary. Even if the summary is accurate, the decision about what to keep is made by a degraded session. Anthropic’s official blog covers the distinction between /clear, /rewind, compact, and subagents for this reason.

My closest prior practice was in an older piece on Claude Code session management: commit state to git before compaction summarizes it away. That article is old enough that it ends with a “Summary” heading—something I’d cut under current style rules—but the idea holds. Don’t use the session as storage. Push decisions and progress to files or git, then pass a short handoff into a new session.

Subagents work in the same direction. The Agent SDK docs describe subagents as separate conversations with their own context. Offloading investigation or large log reads to subagents keeps massive tool output out of the main session. But if the subagent returns a wall of text, it defeats the purpose—return only the delta needed for the next decision.

Compression proxies and memory tools are a different layer

This connects to my earlier piece on how Compresr Context Gateway addresses agent context exhaustion. Compresr compresses conversation history and tool output between the agent and the LLM API. CTX injects relevant context into Claude Code right before the prompt via hooks. Both share the same principle: don’t pass everything raw.

The failure modes differ, though.

Layer	What tends to fail	Closest countermeasure
Pre-session context	CLAUDE.md, specs, tickets, tool definitions too large	Shorten rules, move details to referenced files
In-session context	Read/grep results, error output, failed diffs linger	Subagents, short investigation summaries, early checkpoints
Cross-session	Yesterday’s decisions and discarded plans forgotten	Git, handoff notes, CTX, explicit memory
Near the limit	Compaction drops initial constraints and details	Clear/compact/rewind before degradation

Trying to solve everything with a single memory tool just bloats the context again. Deciding “what not to include in the current turn” is often more effective than deciding “what to keep.”

The signs show up as re-reads and do-overs

Looking at your own session logs is faster than checking model names or benchmarks.

The model re-reads the same files repeatedly. It proposes designs that were already rejected. Each error fix introduces a new error. Explanations grow longer while edits get shallower. You find yourself saying “we already rejected that approach” more than once.

When this happens, don’t keep pushing prompts. Write out the current state: files touched, adopted approach, discarded approaches, remaining errors, next verification step. Then /clear, or start a new session with that handoff.

Claude Code’s 1M context is a real upgrade. But 1M doesn’t mean “dump everything and get the same accuracy.” The longer a session runs, the more the agent gets pulled by its accumulated work log rather than reading it.

CLAUDE.md should be rules and pointers only

The CLAUDE.md pattern that’s been working: short absolute rules and pointers to detail files. Template details and skill procedures live outside CLAUDE.md and get read on demand.

This is the same direction as the “trim pre-session context” argument above. “Blog built with Astro 5, TypeScript, Tailwind CSS 4” in CLAUDE.md is enough—no need to paste the full Astro routing spec. Expanding skill procedures inline means carrying those tokens even for unrelated tasks. If “See .claude/skills/style-check/SKILL.md” fits in one line, that’s the right answer.

This blog’s CLAUDE.md was bloated at one point. Guardrails, templates, workflows, all-in-one, thousands of tokens. The workspace at session start was already tight before the first instruction. Now, details live in each skill’s SKILL.md, and CLAUDE.md holds only rule summaries and pointers.

An all-in-one CLAUDE.md feels reassuring to the developer. To the model, it’s like being handed a dictionary opened to every page, every time. Open only the pages you need.

It also tries to end work prematurely

Whether this is directly related to context rot is unclear, but Claude Code has a tendency to wrap up tasks on its own. Mid-task, it says “Great job today!” After a single fix, it switches to “Anything else?” mode.

Likely a form of sycophancy. I wrote earlier about how warmth tuning and agreeable personas amplify sycophancy, and the same dynamic applies here. The model over-indexes on reducing user burden and creating tidy stopping points. Being told “you’re probably done, right?” when you’ve assigned ongoing work is just frustrating.

It’s more pronounced in the back half of heavy sessions. As context grows and next-token prediction diffuses, the model seems to prefer a safe stopping point over continuing work. RLHF reward design may over-reward “polite endings.”

The only workaround is explicit instructions. “I’ll tell you when we’re done—don’t stop on your own” in CLAUDE.md. Fundamentally, this continues until the model gets better at judging task completion or learns to wait for explicit stop signals.