Tech 5 min read

How Compresr's Context Gateway solves context exhaustion for AI agents

IkesanContents

If you use AI coding agents such as Claude Code or Cursor for long enough, you eventually hit the context-window limit and the conversation resets. Large file reads and tool outputs burn through tokens especially fast.

Compresr’s OSS tool, Context Gateway, tries to solve that problem at the proxy layer. It is written in Go, licensed under Apache 2.0, and reached 451 stars and 35 forks in about six weeks after the February 2026 release.

Basic architecture

Context Gateway sits between the agent and the provider as a reverse proxy.

graph LR
    A[AI agent<br/>Claude Code, etc.] --> B[Context Gateway<br/>localhost:18081]
    B --> C[LLM API<br/>Anthropic, OpenAI, etc.]
    B --> D[Compresr API<br/>compression service]

From the agent’s point of view it looks like a normal API endpoint. Gateway intercepts requests and responses and performs context optimization. It supports a wide range of providers, including Anthropic, OpenAI, Google Gemini, AWS Bedrock, Ollama, OpenRouter, and LiteLLM.

Three compression pipelines

The core of Context Gateway is three “pipes.”

1. Preemptive Summarization

This pre-generates summaries of the conversation before the context usage crosses the threshold, which defaults to 85%.

graph TD
    A[Conversation continues] --> B{Monitor context usage}
    B -->|below threshold| A
    B -->|approaching threshold| C[Generate a summary<br/>in the background]
    C --> D[Cache the summary]
    D --> E{Context limit reached}
    E -->|yes| F[Compact immediately<br/>with cached summary]
    E -->|no| A

Normal compaction waits until the limit is reached, which introduces latency. The preemptive approach does the summarization in advance, so the switch is immediate.

Two summarization strategies are available:

StrategyWhat it does
external_providerask a chosen LLM, such as claude-haiku-4-5
compresruse Compresr’s own compression API with the hcc_espresso_v1 model

Session summaries are cached with a TTL of three hours by default.

2. Tool Output Compression

Read and Bash tool outputs can easily run into thousands of tokens. Tool Output Compression shrinks those outputs before they are handed to the LLM.

Configuration example:

pipes:
  tool_output:
    enabled: true
    strategy: "compresr"
    min_tokens: 512
    max_tokens: 128000
    target_compression_ratio: 0.5
    refusal_threshold: 0.05

min_tokens avoids noise, and refusal_threshold skips cases where compression does not buy anything. The original output is cached, and the agent can call expand_context to restore it.

skip_tools lets you exclude specific tool categories per provider.

3. Tool Discovery

Tool Discovery tackles the fact that the tool list itself consumes context. It filters out irrelevant tool definitions based on the current conversation.

always_keep guarantees that important tools are never filtered out.

Phantom Tools

Version 0.5.2 added Phantom Tools, which inject virtual tools without changing the agent’s tool list. For example, expand_context is a Phantom Tool that Gateway injects automatically when the agent needs to inspect compressed output in more detail.

graph TD
    A[Agent receives tool list] --> B[Gateway injects<br/>expand_context]
    B --> C[Agent calls<br/>expand_context]
    C --> D[Gateway returns original data<br/>from cache]

That is the main advantage of the proxy approach: you can add features without changing the agent itself.

Setup

Installation is a one-liner:

curl -fsSL https://compresr.ai/api/install | sh

A TUI wizard opens and lets you configure it interactively.

Supported agents:

SettingAgent
claude_codeClaude Code
cursorCursor IDE
openclawopen-source Claude Code alternative
customany agent

Configuration is stored as YAML under ~/.config/context-gateway/configs/, and hot reload means you do not need to restart Gateway after changes.

Dashboard

Version 0.4.3 added a React-based dashboard for watching compression in real time.

  • session stats, including token reduction and cost savings
  • message classification by user / assistant / tool
  • compression log visualization

It is available at http://localhost:18081/dashboard/.

Cost control

The cost_control section lets you set budget caps per session and globally.

cost_control:
  enabled: true
  session_cap: 5.0
  global_cap: 100.0

Security

For SSRF protection, requests are forwarded only to allowlisted LLM provider domains. Localhost (localhost / 127.0.0.1) is excluded by default, and you have to opt in explicitly with GATEWAY_ALLOW_LOCAL=true.

XSS protection, stream buffer limits (MaxStreamBufferSize), and request-body size limits are also in place.

Because it is a proxy, though, you should remember that API keys pass through Gateway. The project warns you to keep secrets in .env and reference them via environment variables such as ${ANTHROPIC_API_KEY} instead of writing them directly into config files.

Logging and telemetry

Each pipeline writes JSONL logs.

Log fileContents
history_compaction.jsonlpreemptive summarization history
tool_output_compression.jsonltool output compression events
tool_discovery.jsonltool filtering results
telemetry.jsonloverall request telemetry
session_stats.jsonsession summary stats

It also supports ATIF, the Agent Trajectory Interchange Format, which makes it useful for evaluations such as SWE-bench.


The idea of saving context in a proxy layer instead of in the agent itself is neat. It is still only v0.5, but it is the kind of tool that makes you want to try it with Claude Code.