How Compresr's Context Gateway solves context exhaustion for AI agents
Contents
If you use AI coding agents such as Claude Code or Cursor for long enough, you eventually hit the context-window limit and the conversation resets. Large file reads and tool outputs burn through tokens especially fast.
Compresr’s OSS tool, Context Gateway, tries to solve that problem at the proxy layer. It is written in Go, licensed under Apache 2.0, and reached 451 stars and 35 forks in about six weeks after the February 2026 release.
Basic architecture
Context Gateway sits between the agent and the provider as a reverse proxy.
graph LR
A[AI agent<br/>Claude Code, etc.] --> B[Context Gateway<br/>localhost:18081]
B --> C[LLM API<br/>Anthropic, OpenAI, etc.]
B --> D[Compresr API<br/>compression service]
From the agent’s point of view it looks like a normal API endpoint. Gateway intercepts requests and responses and performs context optimization. It supports a wide range of providers, including Anthropic, OpenAI, Google Gemini, AWS Bedrock, Ollama, OpenRouter, and LiteLLM.
Three compression pipelines
The core of Context Gateway is three “pipes.”
1. Preemptive Summarization
This pre-generates summaries of the conversation before the context usage crosses the threshold, which defaults to 85%.
graph TD
A[Conversation continues] --> B{Monitor context usage}
B -->|below threshold| A
B -->|approaching threshold| C[Generate a summary<br/>in the background]
C --> D[Cache the summary]
D --> E{Context limit reached}
E -->|yes| F[Compact immediately<br/>with cached summary]
E -->|no| A
Normal compaction waits until the limit is reached, which introduces latency. The preemptive approach does the summarization in advance, so the switch is immediate.
Two summarization strategies are available:
| Strategy | What it does |
|---|---|
| external_provider | ask a chosen LLM, such as claude-haiku-4-5 |
| compresr | use Compresr’s own compression API with the hcc_espresso_v1 model |
Session summaries are cached with a TTL of three hours by default.
2. Tool Output Compression
Read and Bash tool outputs can easily run into thousands of tokens. Tool Output Compression shrinks those outputs before they are handed to the LLM.
Configuration example:
pipes:
tool_output:
enabled: true
strategy: "compresr"
min_tokens: 512
max_tokens: 128000
target_compression_ratio: 0.5
refusal_threshold: 0.05
min_tokens avoids noise, and refusal_threshold skips cases where compression does not buy anything. The original output is cached, and the agent can call expand_context to restore it.
skip_tools lets you exclude specific tool categories per provider.
3. Tool Discovery
Tool Discovery tackles the fact that the tool list itself consumes context. It filters out irrelevant tool definitions based on the current conversation.
always_keep guarantees that important tools are never filtered out.
Phantom Tools
Version 0.5.2 added Phantom Tools, which inject virtual tools without changing the agent’s tool list. For example, expand_context is a Phantom Tool that Gateway injects automatically when the agent needs to inspect compressed output in more detail.
graph TD
A[Agent receives tool list] --> B[Gateway injects<br/>expand_context]
B --> C[Agent calls<br/>expand_context]
C --> D[Gateway returns original data<br/>from cache]
That is the main advantage of the proxy approach: you can add features without changing the agent itself.
Setup
Installation is a one-liner:
curl -fsSL https://compresr.ai/api/install | sh
A TUI wizard opens and lets you configure it interactively.
Supported agents:
| Setting | Agent |
|---|---|
| claude_code | Claude Code |
| cursor | Cursor IDE |
| openclaw | open-source Claude Code alternative |
| custom | any agent |
Configuration is stored as YAML under ~/.config/context-gateway/configs/, and hot reload means you do not need to restart Gateway after changes.
Dashboard
Version 0.4.3 added a React-based dashboard for watching compression in real time.
- session stats, including token reduction and cost savings
- message classification by user / assistant / tool
- compression log visualization
It is available at http://localhost:18081/dashboard/.
Cost control
The cost_control section lets you set budget caps per session and globally.
cost_control:
enabled: true
session_cap: 5.0
global_cap: 100.0
Security
For SSRF protection, requests are forwarded only to allowlisted LLM provider domains. Localhost (localhost / 127.0.0.1) is excluded by default, and you have to opt in explicitly with GATEWAY_ALLOW_LOCAL=true.
XSS protection, stream buffer limits (MaxStreamBufferSize), and request-body size limits are also in place.
Because it is a proxy, though, you should remember that API keys pass through Gateway. The project warns you to keep secrets in .env and reference them via environment variables such as ${ANTHROPIC_API_KEY} instead of writing them directly into config files.
Logging and telemetry
Each pipeline writes JSONL logs.
| Log file | Contents |
|---|---|
history_compaction.jsonl | preemptive summarization history |
tool_output_compression.jsonl | tool output compression events |
tool_discovery.jsonl | tool filtering results |
telemetry.jsonl | overall request telemetry |
session_stats.json | session summary stats |
It also supports ATIF, the Agent Trajectory Interchange Format, which makes it useful for evaluations such as SWE-bench.
The idea of saving context in a proxy layer instead of in the agent itself is neat. It is still only v0.5, but it is the kind of tool that makes you want to try it with Claude Code.