Tech 13 min read

Adding Working Memory to Claude Code with CTX

IkesanContents

Claude Code forgets cleanly at the start of every session.
Design decisions made yesterday, rejected approaches, files to touch, preferences you explained before — none of it carries over to the next terminal.

A CTX article on DEV Community addresses this amnesia fairly directly.
It hooks into Claude Code’s UserPromptSubmit, and before the user’s prompt reaches the model, it pulls related past decisions, code, documentation, and chat history locally and injects them.

In CLAUDE.md token management, I wrote about saving progress and decisions to files and git.
CTX targets the next hassle: instead of explicitly telling the model to “read this,” it fetches relevant context before each prompt automatically.

Injecting via UserPromptSubmit

CTX enters through Claude Code’s hook system.
The README says ctx-install adds UserPromptSubmit and PostToolUse hooks to ~/.claude/settings.json.
The DEV article shows pip install ctx-retriever && ctx-install, or /plugin install ctx@jaytoone from within Claude Code.

The injected information falls into three groups.

GroupWhat it fetchesUse case
G1Decision history from git logWhy a particular structure was chosen, why alternatives were discarded
G2Code and Markdown docsFunction names, related files, documentation
CMPast chats in SQLitePolicies explained earlier, user-specific context

G2 goes beyond plain full-text search.
The README defines four trigger types, routing queries through explicit symbol names, concept search, dependency analysis, and temporal history.
For dependency queries, it traverses the import graph via BFS to catch “what uses this” links that text search alone misses.

This differs from Compresr Context Gateway in direction.
Compresr is a proxy that compresses context between the agent and the LLM API.
CTX adds context right before Claude Code’s input.
Rather than narrowing what goes in, it pre-fetches what should be there.

Not cramming everything into a vector DB

What CTX gets right is not collapsing the word “memory” into a single thing.
Git log decision history, code search, and chat history look similar but need different retrieval methods.

Design decisions in git log are easier to trace by commit than by vector search.
Code retrieval leans on symbol names and import relationships.
Past chats use BM25 with optional vector search to find fragments from previous user conversations.

YourMemory exposes recall_memory, store_memory, and update_memory as an MCP server, mixing BM25, vector search, graph, and a forgetting curve.
WUPHF uses Markdown and Git as the source of truth, promoting agent-generated knowledge to a wiki.
Both had “where to put the canonical record” as a central theme.

CTX leans more toward the operational side of Claude Code.
Instead of creating a new source of truth, it bundles what’s already at hand — git, code, Markdown, Claude Code session logs — right before each prompt.
This lightness suits solo projects.

Strong numbers, but check how they’re measured

The DEV article cites a memory recall benchmark where the baseline scored 0.00 and CTX v3 scored 0.88 Recall.
In real-world measurement over 10,000+ turns, 39.6% of injected items were actually cited by Claude overall, with chat memory at 52.6%.

The GitHub README has separate evaluations.
On a synthetic benchmark, CTX scored Recall@5 of 0.874, token usage 5.2%, TES 0.776.
On external codebases (Flask, FastAPI, Requests), average Recall@5 was 0.163 higher than BM25.
On CodeSearchNet Python — finding code from natural language queries — CTX Adaptive Trigger’s Recall@5 was 0.740, losing to BM25’s 0.980.

That loss is actually informative.
CTX isn’t a universal search engine; it excels in familiar codebases where symbol and dependency lookups work well.
For natural language queries like “find code that does X,” Dense Embedding or Hybrid Dense+CTX performs better.

Cloudflare Agent Memory classifies memories into Facts, Events, Instructions, and Tasks, with HyDE and RRF for retrieval.
CTX doesn’t aim for that scale.
Instead, it quickly selects “what to add to this prompt right now” within a single Claude Code project.

Holes to check before installing

CTX is local-only, with no LLM calls and no mandatory telemetry.
ctx-retriever on PyPI was at 0.3.11 as of 2026-05-02, MIT-licensed, Python 3.9+.
The GitHub repo was also updated on 2026-05-02.

There’s hook collision risk.
ctx-install is documented to not overwrite existing hooks and to deduplicate by command string, but if you’re already running hooks on Claude Code, check the diff in ~/.claude/settings.json.

There’s also injection bias.
Auto-fetching related files before every prompt is convenient, but it can pull in outdated decisions or broken implementations.
The README mentions a [fix] tag to suppress anchoring to existing implementations.
When past context is too strong during a fix task, the model drifts toward reusing broken designs.

There’s a scale constraint too.
Under Hook Performance in the README, codebases with 2,000+ files are auto-skipped.
It fits small-to-mid projects where you keep re-explaining past decisions and related files.
For large monorepos or exploring unfamiliar codebases, look at AST-based indexers or dedicated RAG systems instead.

What each memory tool preserves

Looking at CTX alone doesn’t reveal the full “memory” picture.
The tools that have appeared over the past six months are solving different problems.

ToolInjection methodStorageCross-session
CLAUDE.mdAuto-loaded at session startProject fileYes (manual updates)
Claude Code auto-memoryAuto-loaded at session start~/.claude/projects/Yes (auto-accumulated from conversation)
CTXHook injection before each promptgit log / source code / SQLitePartial (only chat history persists in SQLite)
YourMemoryMCP store/recallSQLite + vector DBYes (decays via forgetting curve)
WUPHFMCP wiki read/writeMarkdown + GitYes (shared across multiple agents)
Cloudflare Agent MemoryAPIDurable Objects + VectorizeYes (managed, structured in 4 categories)

These split into two camps.
Retrieval-oriented tools pull relevant context within a session, and persistence-oriented tools retain information across sessions.

CTX is fully retrieval-oriented.
It just pulls relevant fragments from git, code, and chat history before each prompt — it has no mechanism to write new memories.
Compresr also doesn’t persist, but its direction is the opposite: context compression, not memory.

On the persistence side, tools diverge on what they store and how.

Claude Code’s built-in auto-memory picks up user roles and feedback from conversation and auto-loads them at the start of the next session.
It’s isolated per project, so separate repositories accumulate separate memories.
CLAUDE.md is also read every session, but it’s a static home for rules and templates — it doesn’t grow from conversation.

YourMemory requires explicit store/recall calls but uses a forgetting curve to lower scores for stale memories.
It addresses the garbage-accumulation problem from the “forgetting” side.

WUPHF is a team wiki, not a personal tool.
It writes decision rationale and work logs from multiple agents into Markdown and tracks who wrote what via Git history.

Cloudflare Agent Memory is a managed service that structures memories into Facts, Events, Instructions, and Tasks, with HyDE and RRF for retrieval.
It’s production infrastructure, not a personal memory layer.

OCR-Memory from arXiv takes a different angle, converting agent work history to images and searching over them.
It’s a research-stage answer to the problem of text summaries dropping detail.

What to add for personal projects

If you run personal projects in Claude Code, CLAUDE.md and auto-memory handle most of the friction.
Project conventions and templates go in CLAUDE.md; auto-memory accumulates preferences and corrections on its own.
No extra installation needed.

The gap is code context across sessions.
Files read yesterday, design decisions from last week, approaches that didn’t work out.
Auto-memory covers user preferences and roles, not code context.
Stuffing everything into CLAUDE.md bloats it and buries the rules you actually want enforced.

CTX fills that gap.
It pulls related fragments from git log, source, and chat history on each prompt, so you don’t have to keep saying “look at that file” or “here’s why I chose that approach.”
But CTX doesn’t write new memories — it only searches and injects existing information.
It works as long as git, code, and chat history are local, but doesn’t persist beyond that.

YourMemory has a role when you want to accumulate knowledge that doesn’t live naturally in code.
Explicitly store judgments like “this API is slated for deprecation” or “this library had compatibility issues,” and the forgetting curve sinks old memories over time.
Running an MCP server adds setup overhead.

WUPHF is heavy for solo use.
It’s a wiki for sharing decision rationale across multiple people and agents — different scope from a personal memory layer.
Cloudflare Agent Memory is infrastructure for embedding memory into production systems, so the comparison itself doesn’t apply.

For solo Claude Code usage, start with CLAUDE.md + auto-memory. When cross-session code context becomes a pain point, add CTX.

1M context still isn’t enough

With this many memory tools appearing, the question is why they’re needed at all.
Claude (1M tokens), Gemini (1M tokens), OpenAI’s Codex (1M tokens) — the major coding agents’ context windows have all converged around 1M.
Just dump everything in, right?

It doesn’t work.
A mid-size project (around 2,500 files) hits 500K-750K tokens on source code alone.
Add conversation history, tool output, diffs, and error logs, and a 1M window fills up in the back half of a session.
A Next.js monorepo (27,000+ files) consumed 739K tokens in a single code review scan.
Mid-size is tight; large-scale is impossible.

Text and images share one pool

The stated 1M tokens isn’t a text-only limit.
Images, video, and audio all consume tokens from the same single context window.
There’s no separate budget for text and images — they eat from the same pool.

Tokenization rules differ slightly by model.

Claude splits images into 512x512 tiles: 85 base + tiles x 170 tokens.
A rough formula is width x height / 750.
One 1080p screenshot runs about 1,600 tokens.
Opus 4.7 extended the long-side limit to 2576px, pushing high-res images close to 4,800 tokens each.

Gemini uses 768x768 tiles at 258 tokens each.
Per-image token efficiency is better than Claude, but Gemini also handles video and audio.
Video at 1fps extraction is 258 tokens per frame.
Audio is 32 tokens per second.
A one-minute video consumes 15,480+ tokens for video alone, 17,400+ with audio.
Passing demo videos during code review eats into the text budget fast.

OpenAI uses a tile scheme similar to Claude: 85 base + 512x512 tiles x 170 tokens.
A 1024x1024 image is 765 tokens.
Low-resolution mode fixes it at 85 tokens per image, usable for quick UI checks.

How much does this matter in practice?
A Claude Code session with 5 error screenshots and 3 UI checks uses roughly 13,000 tokens on images alone.
1.3% of 1M looks trivial, but on top of source code at 500K, conversation history at 100K, and tool output at 200K, 13K of images out of the remaining 200K starts to matter in late sessions.

Gemini with 3 demo videos (5 minutes total) is more dramatic — 87,000+ tokens gone to video and audio.
Mixing multimodal content into workflows designed for text-only shrinks effective capacity more than the numbers suggest.

Longer input, lower accuracy

Even if everything fits, accuracy drops.
Liu et al. showed that simply moving the document needed for an answer from the beginning to the middle of the context dropped QA accuracy by over 30%.
Attention concentrates on the beginning and end; information buried in the middle gets missed.
This is known as “lost in the middle,” attributed to position bias from RoPE (Rotary Position Embedding).
Du et al. went further: replacing irrelevant tokens with whitespace to eliminate noise entirely, accuracy still dropped 13.9-85% from increased input length alone.
Longer input is a penalty in itself.

Cost

With Claude Sonnet (input $3/M tokens), sending 200K tokens per request costs $0.60 per call.
One report reduced 739K tokens to 15K with knowledge graph extraction — a 49x cost reduction.
Another compressed 412K tokens to 3.4K, a 99.2% reduction.
Prompt caching charges 10% for cache hits, but fresh diffs are full price every time.

It’s a selection problem

What memory tools actually solve is closer to “selection” than “memory.”
A 2,500-file project has a single task touching maybe 5-15 files.
Including everything degrades accuracy, inflates cost, and increases latency.
Whether CTX injects related code via hooks or YourMemory recalls via MCP, both select specific fragments and add them to the input.
Whether the selection spans across sessions just determines whether past chats and git log are in scope.

Same 1M, different contents

The stated 1M tokens look equivalent, but how each agent uses that window is completely different.

Claude Code runs compaction within a session.
Old tool outputs and conversation exchanges get summarized and compressed to free up space for new input.
The 1M isn’t a static buffer — it’s a rolling window where contents are constantly replaced.
Code analysis results from 50 messages ago aren’t there in original form.
CLAUDE.md and the system prompt occupy permanent slots, and the rest fills with recent exchanges and compressed summaries.
Even with a “capacity” of 1M, you only access recent content and a past that’s lost information through compression.

OpenAI’s Codex is different again.
Each task spins up a cloud sandbox and tears it down when done.
The 1M window lives only within a single task.
Files read and diffs produced in the previous task don’t carry over.
The concept of a session is thin, and memory accumulation doesn’t happen at the architecture level.

Gemini can fill its 1M in one shot.
Dump in an entire codebase and ask “fix this bug.”
But lost in the middle affects Gemini too — attention skews toward the beginning and end.
Being able to ingest a lot and being able to use the ingested information evenly are separate things.

Whether compacting in a rolling window, using and discarding per task, or filling in bulk — the same 1M yields very different answers to “how much past context can you actually retain?”
Memory tools like CTX exist not just because windows are too small, but because agent architectures are designed to discard context in the first place.

References