Cloudflare Agent Memory and the Agent Readiness Score
Contents
At the 2026-04-17 Agents Week, Cloudflare shipped Agent Memory and isitagentready.com almost simultaneously. The former is a managed service that runs agent memory on the platform side; the latter is a Lighthouse-style tool that scores how agent-friendly your website is. One is “what’s inside the agent” (long-term memory), the other is “what the agent reads on the outside” (the site’s readiness). In the same week, Cloudflare filled in both ends at once.
For the other Agents Week releases, see Sandboxes GA, Durable Object Facets, and the Unified CLI, Mesh and the Enterprise MCP Reference Architecture, and Project Think, the Browser Run revamp, and Workflows v2.
Agent Memory
Agent Memory launched in private beta as a managed service that extracts important information from agent conversation history and recalls it later when needed.
Existing agents either forget the past when conversations exceed the context window, or they stuff the entire history into every prompt and suffer from context rot. Chroma Context-1 and Compresr Context Gateway tried to solve this on the agent side and the proxy side respectively. Cloudflare chose a different path: provide it as a platform-managed service.
Four operations and four memory types
The API surface of Agent Memory is deliberately small.
| Operation | Role |
|---|---|
ingest | Bulk-loads conversation history, extracts memories, and stores them |
remember | Lets the model explicitly write “keep this” during execution |
recall | Takes a question and returns a synthesized answer |
forget / list | Deletes memories and lists them |
Stored memories fall into four types.
| Type | Character | Example |
|---|---|---|
| Facts | Near-immutable facts | This project uses GraphQL |
| Events | Timestamped occurrences | Deployed to production on 2026-04-15 |
| Instructions | Procedures and conventions | Always run lint before deploying |
| Tasks | Ephemeral in-flight items | Waiting on review for PR #123 |
Facts and Instructions carry normalized topic keys. When a new memory comes in with the same key, it supersedes the old one — not a duplicate but a version chain. Events are distinguished by timestamp and accumulate over time. Tasks are short-lived and discarded once complete.
This classification matters because retrieval strategy changes by type. “Does this project use GraphQL?” is answerable by a Facts key lookup. “What did we deploy last month?” is a time-ranged Events query. Running everything through the same vector search would be strictly worse.
Ingestion pipeline
Ingestion runs through multiple stages.
flowchart TD
A[Conversation message] --> B[SHA-256 hash ID<br/>truncated to 128 bits]
B --> C{Already exists?}
C -->|Yes| Z[Skip<br/>INSERT OR IGNORE]
C -->|No| D[Extraction paths]
D --> E[Full path<br/>10K char chunks<br/>2 msg overlap]
D --> F[Detail path<br/>fires at 9+ messages<br/>names, prices, versions]
E --> G[Validation 8 checks]
F --> G
G --> H[Classify<br/>Facts/Events/Instructions/Tasks]
H --> I[Durable Object write]
I --> J[Async vectorization<br/>adds 3-5 query transforms]
J --> K[Stored in Vectorize]
A few things stand out.
First, deterministic IDs. Content addressing with SHA-256 means re-ingesting the same utterance never double-writes. If an agent restarts and replays the whole history, ingestion stays idempotent.
Second, the two-stage extraction. The Full path chunks messages every 10,000 characters and extracts summary-style memories. Up to four chunks run in parallel. The Detail path only fires on conversations with 9+ messages, aiming specifically at numbers, names, and versions. Summaries alone tend to lose concrete details like “ran on version 2.3.1,” which is why there’s a dedicated pass for them.
Third, the vectorization trick. Facts and Instructions are stored as declarative sentences, but user queries are interrogative. “Prefers dark mode” and “which theme is good?” are not necessarily close in vector space. Agent Memory adds 3–5 query transforms at storage time (rewriting the statement as questions like “which theme?”) and embeds those, bridging the declarative-vs-interrogative gap.
Retrieval: five parallel channels plus RRF
Internally, recall runs five search channels in parallel.
| Channel | What it’s good at |
|---|---|
| Full-text search | Porter stemming for keyword precision |
| Fact-key lookup | Direct match on normalized keys |
| Raw message search | Last line of defense — never misses verbatim fragments |
| Direct vector search | Query embedding finds semantic neighbors |
| HyDE vector search | Generates a hypothetical answer document and embeds that |
HyDE (Hypothetical Document Embedding) is a classic technique that works well on abstract or multi-hop questions. For a vague query like “what are the user’s preferences?”, the LLM first drafts a hypothetical answer, and that answer is embedded for search. It exploits the fact that answers tend to be more similar to target documents than questions are.
The five results are merged via Reciprocal Rank Fusion (RRF). It’s an old technique: combine the rankings from each channel with weights. Fact-key matches carry the heaviest weight; ties break toward newer entries. In the final synthesis step, a natural-language answer is composed, but time arithmetic is handled by regex and arithmetic — not by the LLM. Relative times like “the thing two weeks ago” drift if you ask the LLM, so a separate path computes them exactly.
Model selection
Different models are used for different roles.
| Task | Model | Notes |
|---|---|---|
| Extraction, validation, classification, query parsing | Llama 4 Scout | 17B MoE · 16 experts |
| Synthesis, answer generation | Nemotron 3 | 120B MoE · 12B active |
Structured tasks are handled by a lighter model; only the reasoning-heavy synthesis uses the large model. It’s a straightforward balance between inference cost and quality. Both run on Workers AI, which avoids external API latency and rate limits.
Built entirely from Cloudflare primitives
Agent Memory is assembled from existing Cloudflare primitives.
| Component | Role |
|---|---|
| Durable Objects | SQLite-backed storage of raw messages and classified memories |
| Vectorize | Embeddings and semantic search |
| Workers AI | LLM inference for extraction, classification, and synthesis |
Each memory profile gets its own Durable Object instance and vector index. Tenant isolation is structurally guaranteed, and writes are transactional.
Usage from a Worker is simple via binding.
const profile = await env.MEMORY.getProfile("profile-name");
await profile.ingest(messages, { sessionId });
await profile.remember({ content, sessionId });
const results = await profile.recall("query");
Agents outside Cloudflare can call it over REST API. A session-affinity header routes requests to the same backend, making it easy to benefit from prompt caching.
Use cases and internal dogfooding
Cloudflare calls out four use cases.
- A single agent that keeps state across sessions
- Team-shared knowledge (architecture decisions, design patterns, tribal knowledge)
- Tool coordination (a code-review bot passing context to a coding agent)
- Persistent state for agents running autonomously in the background
Internally, they’ve deployed it into OpenCode plugin development workflow integration, agentic code review (remembering past judgments so it doesn’t repeat the same feedback), and chatbots with persistent conversation history.
Development itself is agent-driven: run benchmarks → gap analysis → propose fixes → human verification → agent-implemented changes, and loop. Evaluation uses multiple memory benchmarks in combination (LoCoMo, LongMemEval, BEAM) to avoid overfitting to a single metric.
Data ownership
All memories are exportable. Cloudflare makes a point of rejecting vendor lock-in. Letting a platform own your agent’s memory is convenient, but convenient-turning-into-captive is a real worry. Having an export API from day one is the answer to that concern.
Positioning as an agent memory layer
Agent memory has been attacked from several directions: inference-time context management like Chroma Context-1, proxy layers like Compresr Context Gateway, and window expansion like Claude’s 1M context. What’s distinct about Agent Memory is that it carves out a “shared store that lives outside the agent.” Reconciling declarative knowledge through Fact-key supersession and securing retrieval quality with HyDE+RRF is essentially RAG wisdom repackaged for agents.
It’s in private beta with a waitlist. If you’re running agents on the Cloudflare ecosystem, it’s worth trying before writing your own memory layer from scratch.
The Agent Readiness score and isitagentready.com
Also launched during Agents Week was isitagentready.com. Enter your site’s URL and it scores how agent-usable the site is. What Lighthouse did for web performance and accessibility, this does for agent-era web standards. At the same time, Cloudflare Radar scanned the top 200,000 domains worldwide and published adoption rates for AI agent standards. The short version: the web is almost entirely unprepared for agents.
The agent standards being measured
isitagentready.com scores four independent categories.
| Category | Checks |
|---|---|
| Discoverability | robots.txt, sitemap.xml, Link Headers (RFC 8288) |
| Content | Markdown content negotiation |
| Bot Access Control | Content Signals, robots.txt AI bot rules, Web Bot Auth |
| Capabilities | Agent Skills, API Catalog (RFC 9727), OAuth discovery (RFC 8414/9728), MCP Server Card, WebMCP |
Separately, standards for agent-initiated payments — x402, Universal Commerce Protocol, Agentic Commerce Protocol — are checked but don’t count toward the score yet.
A quick primer for readers unfamiliar with these.
- Markdown content negotiation. When an agent sends
Accept: text/markdown, the server returns a Markdown version instead of HTML. Cloudflare measured up to 80% token reduction compared to HTML. - Content Signals. Writing something like
Content-Signal: ai-train=no, search=yes, ai-input=yesin robots.txt lets you declare “no training, yes search indexing, yes inference-time grounding” separately — finer-grained than plain allow/deny. - Web Bot Auth. An IETF draft where the bot signs HTTP requests and receivers verify via public key. Public keys live at
/.well-known/http-message-signatures-directory. - MCP Server Card. Publish a JSON at
/.well-known/mcp/server-card.jsondescribing which tools the server exposes, where to connect, and how to authenticate, and agents can discover capabilities before connecting. - API Catalog (RFC 9727).
/.well-known/api-cataloghands agents a catalog of public APIs, so they don’t have to scrape a developer portal. - OAuth discovery (RFC 9728). For sites that require login, this tells agents where the authorization server is. Users can grant access explicitly via OAuth instead of the less-safe workaround of sharing browser login sessions with agents. Agents Week 2026 also announced full Cloudflare Access support for this flow.
Global agent readiness is “nearly zero”
Cloudflare Radar published scan results from 200,000 domains in a new “Adoption of AI agent standards” chart. The headline numbers:
| Item | Adoption |
|---|---|
| robots.txt exists | 78% |
| robots.txt declares Content Signals | 4% |
Correctly serves Markdown on Accept: text/markdown | 3.9% |
| Both MCP Server Card and API Catalog (RFC 9727) | Fewer than 15 of 200,000 domains |
robots.txt is nearly universal, but most of it is aimed at traditional search crawlers. Files written with AI agents in mind are still a minority. MCP Server Card and API Catalog are even more rare: fewer than 15 out of 200,000 popular sites worldwide. The flip side: there’s huge room to stand out by adopting these early.
The chart updates weekly and is available by category on the Radar “AI Insights” page. Data Explorer and the Radar API expose it too.
isitagentready.com itself is a reference implementation
Worth noting: the scoring site itself is a reference implementation of agent-readiness.
https://isitagentready.com/.well-known/mcp.jsonexposes a stateless MCP server over Streamable HTTP. Calling thescan_sitetool runs a scan through MCP without the web UI.https://isitagentready.com/.well-known/agent-skills/index.jsonpublishes an Agent Skills index. For each checked standard, it provides implementation guides describing “what passes.”
Each failing check also comes with a ready-made prompt “pass this to your coding agent and it’ll fix it.” You don’t just see the score — you can hand the fix to your own agent right there.
Integration with Cloudflare URL Scanner
An Agent Readiness tab has been added to the existing URL Scanner.
The same check suite now runs alongside HTTP header, TLS, DNS, tech-stack, and security-signal analysis.
From the API, pass agentReadiness: true in the options.
curl -X POST https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/urlscanner/v2/scan \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
-d '{
"url": "https://www.example.com",
"options": {"agentReadiness": true}
}'
That makes it easy to wire Lighthouse-style scoring into CI or monitoring for organizations that want to keep tabs on agent-readiness continuously.
How Cloudflare rebuilt its Docs to be the most agent-friendly on the web
Personally, this was the most valuable part of the announcement. Cloudflare thoroughly rebuilt its Developer Docs for agents and went as far as publishing benchmark numbers comparing the result against other doc sites.
URL fallback: /index.md
Of 7 agents tested as of 2026-02, only Claude Code, OpenCode, and Cursor sent Accept: text/markdown by default.
For the rest, Cloudflare built a URL-based fallback.
Specifically, appending /index.md to any page URL returns the Markdown version.
https://developers.cloudflare.com/r2/get-started/index.md
The clever part: it’s not two static files kept in sync — it’s implemented dynamically with two Cloudflare rules.
- A URL Rewrite Rule matches requests ending in
/index.mdand usesregex_replaceto strip the/index.md, rewriting to the base path. - A Request Header Transform Rule matches the pre-rewrite path (
raw.http.request.uri.path) and injectsAccept: text/markdownautomatically.
So any request to /index.md gets Markdown back regardless of what headers the client sends.
No build steps, no double-management of content.
flowchart LR
A[Agent GETs<br/>/r2/get-started/index.md] --> B[Header Transform Rule<br/>raw.uri.path ends with /index.md<br/>→ inject Accept: text/markdown]
B --> C[URL Rewrite Rule<br/>strip /index.md via regex_replace]
C --> D[Original page returns<br/>as Markdown]
How to fold llms.txt at scale
llms.txt is a spec proposed in September 2024: a plaintext file at the site root that tells LLMs “what this site is and where the content lives.” Think of it as an LLM-facing sitemap.
But shoving 5,000+ pages into a single llms.txt blows out any model’s context window. Cloudflare split llms.txt per top-level directory, and the root llms.txt references those.
https://developers.cloudflare.com/llms.txt ← parent; index of each product
https://developers.cloudflare.com/r2/llms.txt
https://developers.cloudflare.com/workers/llms.txt
They also aggressively trimmed out directory-listing pages LLMs get little value from.
About 450 “localized mere table-of-contents” pages like https://developers.cloudflare.com/workers/databases/ were excluded from llms.txt.
Child pages are already listed individually, so leaving the index pages in just forces agents to make one extra round trip to reach real content.
The entries themselves were also upgraded. Each link has a semantic name, a precise URL, and a high-signal description — so the LLM can decide in one shot which page to fetch. The Product Content Experience (PCX) team rewrote page titles, descriptions, and URL structures with agents in mind.
Hidden directives, and how old docs are handled
Every HTML page embeds a “hidden note to LLMs.”
When an agent fetches the HTML version, the note tells it: “HTML wastes context. Either append /index.md or retry with Accept: text/markdown. All docs are also available as a single file at https://developers.cloudflare.com/llms-full.txt.”
Importantly, this directive is stripped from the Markdown version.
If the same note lived inside the Markdown, agents would recursively chase “the Markdown version mentioned inside the Markdown.”
The other thing I liked was the combination with Redirects for AI Training (also shipped 2026-04-17). Docs for legacy versions like Wrangler v1 should stay around for humans as archive, but if LLM crawlers feed on them directly, they’ll regenerate stale advice as “current.” Cloudflare redirects only traffic identified as AI training crawlers to the current docs. Humans see the archive as-is; LLMs only learn from the latest version. An asymmetric setup.
Benchmark results
Using OpenCode + Kimi-k2.5, Cloudflare compared its docs against other major technical doc sites.
| Metric | Cloudflare Docs | Average other site |
|---|---|---|
| Tokens used to answer the same question | Baseline | +31% more |
| Time to correct answer | Baseline | +66% slower |
The “one product directory fits in a single context” design pays off — agents identify and fetch the right page on the first try. Cloudflare calls the underlying problem “grep loops.” If llms.txt is too large to fit in context, agents can’t read the whole file and start keyword-grepping instead. When the first grep misses, the agent burns thinking tokens, revises the query, and greps again. Slower, more expensive, less accurate.
The takeaway: document structure itself determines agent behavior and cost. Putting llms.txt on your site isn’t enough — you have to design granularity on top of it. A useful warning.
Practical impact
Making your site agent-ready is still optional as of 2026, but judging by Radar’s numbers it could transition to “if you don’t do this, you won’t get discovered” within months. For SaaS and API providers in particular, setting up MCP Server Card, API Catalog, and OAuth discovery early raises the odds of being picked by agents.
For personal or small sites, rather than standing up an MCP server, the most cost-effective first moves are supporting Accept: text/markdown and declaring Content Signals.
Markdown content negotiation is nearly free to implement and dramatically lowers token usage, which makes your site more attractive to agents.
If you’re already running coding agents, you can adopt the workflow today: take the prompt that isitagentready.com returns for a failed check and hand it straight to Claude Code or Codex. Unlike Lighthouse, the actual implementation work of raising the score can be outsourced to the agent itself — which is a fun twist.