Tech 15 min read

Cloudflare Agent Memory and the Agent Readiness Score

IkesanContents

At the 2026-04-17 Agents Week, Cloudflare shipped Agent Memory and isitagentready.com almost simultaneously. The former is a managed service that runs agent memory on the platform side; the latter is a Lighthouse-style tool that scores how agent-friendly your website is. One is “what’s inside the agent” (long-term memory), the other is “what the agent reads on the outside” (the site’s readiness). In the same week, Cloudflare filled in both ends at once.

For the other Agents Week releases, see Sandboxes GA, Durable Object Facets, and the Unified CLI, Mesh and the Enterprise MCP Reference Architecture, and Project Think, the Browser Run revamp, and Workflows v2.

Agent Memory

Agent Memory launched in private beta as a managed service that extracts important information from agent conversation history and recalls it later when needed.

Existing agents either forget the past when conversations exceed the context window, or they stuff the entire history into every prompt and suffer from context rot. Chroma Context-1 and Compresr Context Gateway tried to solve this on the agent side and the proxy side respectively. Cloudflare chose a different path: provide it as a platform-managed service.

Four operations and four memory types

The API surface of Agent Memory is deliberately small.

OperationRole
ingestBulk-loads conversation history, extracts memories, and stores them
rememberLets the model explicitly write “keep this” during execution
recallTakes a question and returns a synthesized answer
forget / listDeletes memories and lists them

Stored memories fall into four types.

TypeCharacterExample
FactsNear-immutable factsThis project uses GraphQL
EventsTimestamped occurrencesDeployed to production on 2026-04-15
InstructionsProcedures and conventionsAlways run lint before deploying
TasksEphemeral in-flight itemsWaiting on review for PR #123

Facts and Instructions carry normalized topic keys. When a new memory comes in with the same key, it supersedes the old one — not a duplicate but a version chain. Events are distinguished by timestamp and accumulate over time. Tasks are short-lived and discarded once complete.

This classification matters because retrieval strategy changes by type. “Does this project use GraphQL?” is answerable by a Facts key lookup. “What did we deploy last month?” is a time-ranged Events query. Running everything through the same vector search would be strictly worse.

Ingestion pipeline

Ingestion runs through multiple stages.

flowchart TD
    A[Conversation message] --> B[SHA-256 hash ID<br/>truncated to 128 bits]
    B --> C{Already exists?}
    C -->|Yes| Z[Skip<br/>INSERT OR IGNORE]
    C -->|No| D[Extraction paths]
    D --> E[Full path<br/>10K char chunks<br/>2 msg overlap]
    D --> F[Detail path<br/>fires at 9+ messages<br/>names, prices, versions]
    E --> G[Validation 8 checks]
    F --> G
    G --> H[Classify<br/>Facts/Events/Instructions/Tasks]
    H --> I[Durable Object write]
    I --> J[Async vectorization<br/>adds 3-5 query transforms]
    J --> K[Stored in Vectorize]

A few things stand out.

First, deterministic IDs. Content addressing with SHA-256 means re-ingesting the same utterance never double-writes. If an agent restarts and replays the whole history, ingestion stays idempotent.

Second, the two-stage extraction. The Full path chunks messages every 10,000 characters and extracts summary-style memories. Up to four chunks run in parallel. The Detail path only fires on conversations with 9+ messages, aiming specifically at numbers, names, and versions. Summaries alone tend to lose concrete details like “ran on version 2.3.1,” which is why there’s a dedicated pass for them.

Third, the vectorization trick. Facts and Instructions are stored as declarative sentences, but user queries are interrogative. “Prefers dark mode” and “which theme is good?” are not necessarily close in vector space. Agent Memory adds 3–5 query transforms at storage time (rewriting the statement as questions like “which theme?”) and embeds those, bridging the declarative-vs-interrogative gap.

Retrieval: five parallel channels plus RRF

Internally, recall runs five search channels in parallel.

ChannelWhat it’s good at
Full-text searchPorter stemming for keyword precision
Fact-key lookupDirect match on normalized keys
Raw message searchLast line of defense — never misses verbatim fragments
Direct vector searchQuery embedding finds semantic neighbors
HyDE vector searchGenerates a hypothetical answer document and embeds that

HyDE (Hypothetical Document Embedding) is a classic technique that works well on abstract or multi-hop questions. For a vague query like “what are the user’s preferences?”, the LLM first drafts a hypothetical answer, and that answer is embedded for search. It exploits the fact that answers tend to be more similar to target documents than questions are.

The five results are merged via Reciprocal Rank Fusion (RRF). It’s an old technique: combine the rankings from each channel with weights. Fact-key matches carry the heaviest weight; ties break toward newer entries. In the final synthesis step, a natural-language answer is composed, but time arithmetic is handled by regex and arithmetic — not by the LLM. Relative times like “the thing two weeks ago” drift if you ask the LLM, so a separate path computes them exactly.

Model selection

Different models are used for different roles.

TaskModelNotes
Extraction, validation, classification, query parsingLlama 4 Scout17B MoE · 16 experts
Synthesis, answer generationNemotron 3120B MoE · 12B active

Structured tasks are handled by a lighter model; only the reasoning-heavy synthesis uses the large model. It’s a straightforward balance between inference cost and quality. Both run on Workers AI, which avoids external API latency and rate limits.

Built entirely from Cloudflare primitives

Agent Memory is assembled from existing Cloudflare primitives.

ComponentRole
Durable ObjectsSQLite-backed storage of raw messages and classified memories
VectorizeEmbeddings and semantic search
Workers AILLM inference for extraction, classification, and synthesis

Each memory profile gets its own Durable Object instance and vector index. Tenant isolation is structurally guaranteed, and writes are transactional.

Usage from a Worker is simple via binding.

const profile = await env.MEMORY.getProfile("profile-name");

await profile.ingest(messages, { sessionId });
await profile.remember({ content, sessionId });

const results = await profile.recall("query");

Agents outside Cloudflare can call it over REST API. A session-affinity header routes requests to the same backend, making it easy to benefit from prompt caching.

Use cases and internal dogfooding

Cloudflare calls out four use cases.

  • A single agent that keeps state across sessions
  • Team-shared knowledge (architecture decisions, design patterns, tribal knowledge)
  • Tool coordination (a code-review bot passing context to a coding agent)
  • Persistent state for agents running autonomously in the background

Internally, they’ve deployed it into OpenCode plugin development workflow integration, agentic code review (remembering past judgments so it doesn’t repeat the same feedback), and chatbots with persistent conversation history.

Development itself is agent-driven: run benchmarks → gap analysis → propose fixes → human verification → agent-implemented changes, and loop. Evaluation uses multiple memory benchmarks in combination (LoCoMo, LongMemEval, BEAM) to avoid overfitting to a single metric.

Data ownership

All memories are exportable. Cloudflare makes a point of rejecting vendor lock-in. Letting a platform own your agent’s memory is convenient, but convenient-turning-into-captive is a real worry. Having an export API from day one is the answer to that concern.

Positioning as an agent memory layer

Agent memory has been attacked from several directions: inference-time context management like Chroma Context-1, proxy layers like Compresr Context Gateway, and window expansion like Claude’s 1M context. What’s distinct about Agent Memory is that it carves out a “shared store that lives outside the agent.” Reconciling declarative knowledge through Fact-key supersession and securing retrieval quality with HyDE+RRF is essentially RAG wisdom repackaged for agents.

It’s in private beta with a waitlist. If you’re running agents on the Cloudflare ecosystem, it’s worth trying before writing your own memory layer from scratch.

The Agent Readiness score and isitagentready.com

Also launched during Agents Week was isitagentready.com. Enter your site’s URL and it scores how agent-usable the site is. What Lighthouse did for web performance and accessibility, this does for agent-era web standards. At the same time, Cloudflare Radar scanned the top 200,000 domains worldwide and published adoption rates for AI agent standards. The short version: the web is almost entirely unprepared for agents.

The agent standards being measured

isitagentready.com scores four independent categories.

CategoryChecks
Discoverabilityrobots.txt, sitemap.xml, Link Headers (RFC 8288)
ContentMarkdown content negotiation
Bot Access ControlContent Signals, robots.txt AI bot rules, Web Bot Auth
CapabilitiesAgent Skills, API Catalog (RFC 9727), OAuth discovery (RFC 8414/9728), MCP Server Card, WebMCP

Separately, standards for agent-initiated payments — x402, Universal Commerce Protocol, Agentic Commerce Protocol — are checked but don’t count toward the score yet.

A quick primer for readers unfamiliar with these.

  • Markdown content negotiation. When an agent sends Accept: text/markdown, the server returns a Markdown version instead of HTML. Cloudflare measured up to 80% token reduction compared to HTML.
  • Content Signals. Writing something like Content-Signal: ai-train=no, search=yes, ai-input=yes in robots.txt lets you declare “no training, yes search indexing, yes inference-time grounding” separately — finer-grained than plain allow/deny.
  • Web Bot Auth. An IETF draft where the bot signs HTTP requests and receivers verify via public key. Public keys live at /.well-known/http-message-signatures-directory.
  • MCP Server Card. Publish a JSON at /.well-known/mcp/server-card.json describing which tools the server exposes, where to connect, and how to authenticate, and agents can discover capabilities before connecting.
  • API Catalog (RFC 9727). /.well-known/api-catalog hands agents a catalog of public APIs, so they don’t have to scrape a developer portal.
  • OAuth discovery (RFC 9728). For sites that require login, this tells agents where the authorization server is. Users can grant access explicitly via OAuth instead of the less-safe workaround of sharing browser login sessions with agents. Agents Week 2026 also announced full Cloudflare Access support for this flow.

Global agent readiness is “nearly zero”

Cloudflare Radar published scan results from 200,000 domains in a new “Adoption of AI agent standards” chart. The headline numbers:

ItemAdoption
robots.txt exists78%
robots.txt declares Content Signals4%
Correctly serves Markdown on Accept: text/markdown3.9%
Both MCP Server Card and API Catalog (RFC 9727)Fewer than 15 of 200,000 domains

robots.txt is nearly universal, but most of it is aimed at traditional search crawlers. Files written with AI agents in mind are still a minority. MCP Server Card and API Catalog are even more rare: fewer than 15 out of 200,000 popular sites worldwide. The flip side: there’s huge room to stand out by adopting these early.

The chart updates weekly and is available by category on the Radar “AI Insights” page. Data Explorer and the Radar API expose it too.

isitagentready.com itself is a reference implementation

Worth noting: the scoring site itself is a reference implementation of agent-readiness.

  • https://isitagentready.com/.well-known/mcp.json exposes a stateless MCP server over Streamable HTTP. Calling the scan_site tool runs a scan through MCP without the web UI.
  • https://isitagentready.com/.well-known/agent-skills/index.json publishes an Agent Skills index. For each checked standard, it provides implementation guides describing “what passes.”

Each failing check also comes with a ready-made prompt “pass this to your coding agent and it’ll fix it.” You don’t just see the score — you can hand the fix to your own agent right there.

Integration with Cloudflare URL Scanner

An Agent Readiness tab has been added to the existing URL Scanner. The same check suite now runs alongside HTTP header, TLS, DNS, tech-stack, and security-signal analysis. From the API, pass agentReadiness: true in the options.

curl -X POST https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/urlscanner/v2/scan \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  -d '{
    "url": "https://www.example.com",
    "options": {"agentReadiness": true}
  }'

That makes it easy to wire Lighthouse-style scoring into CI or monitoring for organizations that want to keep tabs on agent-readiness continuously.

How Cloudflare rebuilt its Docs to be the most agent-friendly on the web

Personally, this was the most valuable part of the announcement. Cloudflare thoroughly rebuilt its Developer Docs for agents and went as far as publishing benchmark numbers comparing the result against other doc sites.

URL fallback: /index.md

Of 7 agents tested as of 2026-02, only Claude Code, OpenCode, and Cursor sent Accept: text/markdown by default. For the rest, Cloudflare built a URL-based fallback.

Specifically, appending /index.md to any page URL returns the Markdown version.

https://developers.cloudflare.com/r2/get-started/index.md

The clever part: it’s not two static files kept in sync — it’s implemented dynamically with two Cloudflare rules.

  1. A URL Rewrite Rule matches requests ending in /index.md and uses regex_replace to strip the /index.md, rewriting to the base path.
  2. A Request Header Transform Rule matches the pre-rewrite path (raw.http.request.uri.path) and injects Accept: text/markdown automatically.

So any request to /index.md gets Markdown back regardless of what headers the client sends. No build steps, no double-management of content.

flowchart LR
  A[Agent GETs<br/>/r2/get-started/index.md] --> B[Header Transform Rule<br/>raw.uri.path ends with /index.md<br/>→ inject Accept: text/markdown]
  B --> C[URL Rewrite Rule<br/>strip /index.md via regex_replace]
  C --> D[Original page returns<br/>as Markdown]

How to fold llms.txt at scale

llms.txt is a spec proposed in September 2024: a plaintext file at the site root that tells LLMs “what this site is and where the content lives.” Think of it as an LLM-facing sitemap.

But shoving 5,000+ pages into a single llms.txt blows out any model’s context window. Cloudflare split llms.txt per top-level directory, and the root llms.txt references those.

https://developers.cloudflare.com/llms.txt          ← parent; index of each product
https://developers.cloudflare.com/r2/llms.txt
https://developers.cloudflare.com/workers/llms.txt

They also aggressively trimmed out directory-listing pages LLMs get little value from. About 450 “localized mere table-of-contents” pages like https://developers.cloudflare.com/workers/databases/ were excluded from llms.txt. Child pages are already listed individually, so leaving the index pages in just forces agents to make one extra round trip to reach real content.

The entries themselves were also upgraded. Each link has a semantic name, a precise URL, and a high-signal description — so the LLM can decide in one shot which page to fetch. The Product Content Experience (PCX) team rewrote page titles, descriptions, and URL structures with agents in mind.

Hidden directives, and how old docs are handled

Every HTML page embeds a “hidden note to LLMs.” When an agent fetches the HTML version, the note tells it: “HTML wastes context. Either append /index.md or retry with Accept: text/markdown. All docs are also available as a single file at https://developers.cloudflare.com/llms-full.txt.” Importantly, this directive is stripped from the Markdown version. If the same note lived inside the Markdown, agents would recursively chase “the Markdown version mentioned inside the Markdown.”

The other thing I liked was the combination with Redirects for AI Training (also shipped 2026-04-17). Docs for legacy versions like Wrangler v1 should stay around for humans as archive, but if LLM crawlers feed on them directly, they’ll regenerate stale advice as “current.” Cloudflare redirects only traffic identified as AI training crawlers to the current docs. Humans see the archive as-is; LLMs only learn from the latest version. An asymmetric setup.

Benchmark results

Using OpenCode + Kimi-k2.5, Cloudflare compared its docs against other major technical doc sites.

MetricCloudflare DocsAverage other site
Tokens used to answer the same questionBaseline+31% more
Time to correct answerBaseline+66% slower

The “one product directory fits in a single context” design pays off — agents identify and fetch the right page on the first try. Cloudflare calls the underlying problem “grep loops.” If llms.txt is too large to fit in context, agents can’t read the whole file and start keyword-grepping instead. When the first grep misses, the agent burns thinking tokens, revises the query, and greps again. Slower, more expensive, less accurate.

The takeaway: document structure itself determines agent behavior and cost. Putting llms.txt on your site isn’t enough — you have to design granularity on top of it. A useful warning.

Practical impact

Making your site agent-ready is still optional as of 2026, but judging by Radar’s numbers it could transition to “if you don’t do this, you won’t get discovered” within months. For SaaS and API providers in particular, setting up MCP Server Card, API Catalog, and OAuth discovery early raises the odds of being picked by agents.

For personal or small sites, rather than standing up an MCP server, the most cost-effective first moves are supporting Accept: text/markdown and declaring Content Signals. Markdown content negotiation is nearly free to implement and dramatically lowers token usage, which makes your site more attractive to agents.

If you’re already running coding agents, you can adopt the workflow today: take the prompt that isitagentready.com returns for a failed check and hand it straight to Claude Code or Codex. Unlike Lighthouse, the actual implementation work of raising the score can be outsourced to the agent itself — which is a fun twist.