YourMemory Uses Biological Decay to Discard Stale AI Context

sachitrafa/YourMemory is an implementation that plugs persistent memory for AI agents in as an MCP server.
The Hacker News candidate title said “52% recall,” but when I checked the original README and BENCHMARKS.md on 2026-04-27, LoCoMo-10 Recall@5 had been updated to 59%.
Product Hunt still shows 52%, so either the benchmark numbers or the implementation was updated in a short span.

What’s interesting is that the project leads with “forgetting” rather than “remembering.”
Agent memory accumulates old fixes, outdated assumptions, and workarounds you’ve long stopped using the longer you keep working.
Feeding all of that back into the context every time doesn’t just eat tokens—it drags the model toward outdated correctness.

I previously wrote about splitting knowledge across CLAUDE.md, skills, separate files, and MCPs in the CLAUDE.md token management guide.
YourMemory sits at the “MCP that recalls only the memories you need” end of that spectrum, rather than “documents you read every time.”

LoCoMo-10 Updated to 59%

The benchmark uses 10 conversations from Snap Research’s LoCoMo.
It feeds session_summary entries from multi-session conversations into each system in the same order and measures Recall@5 against 1,534 QA pairs—whether the correct answer appears in the top 5 search results. A heavily search-oriented metric.

YourMemory’s BENCHMARKS.md reports 59% with BM25 + vector + graph + Ebbinghaus decay, versus Zep Cloud at 28%, Supermemory at 31%, and Mem0 at 18%.
That said, Supermemory and Mem0 didn’t complete all samples due to free-tier limits.
Incomplete runs are counted as 0 hits, so take the comparison with a grain of salt.

Still, YourMemory and Zep Cloud both completed all 10 samples, putting them at 59% vs. 28% under the same conditions.
The gap suggests that “keeping session summaries and selecting via search plus decay” beats “summarizing memories with an LLM and extracting only facts” when it comes to LoCoMo’s fine-grained names, dates, and events.

Mixing a Forgetting Curve into Search Scores

YourMemory’s retrieval isn’t simple vector search.
The README describes first pulling nearby memories via vector search, then expanding from those seeds using BFS over a graph.
The implementation also mixes in BM25—src/services/retrieve.py builds a hybrid score with BM25 weighted at 0.4 and the product of vector similarity and memory strength at 0.6.

Memory strength lives in src/services/decay.py and uses Ebbinghaus-style exponential decay.
Decay rates differ by category: failure decays in roughly 11 days, assumption in about 19 days, fact in about 24 days, and strategy in about 38 days.
Higher importance slows the decay, and each retrieval bumps the strength back up.

flowchart TD
    A[User task] --> B[recall_memory]
    B --> C[BM25 search]
    B --> D[Vector search]
    D --> E[Multiply by<br/>memory strength]
    E --> F[Graph expansion]
    C --> G[Merge rankings]
    F --> G
    G --> H[Return only top<br/>memories to context]
    H --> I[store_memory / update_memory]

The design aims to keep long-term memory from becoming just a log dump.
If you make everything RAG-searchable, a workaround that was supposed to be short-lived keeps showing up as a candidate forever.
YourMemory lets unused memories naturally weaken and drops anything below 0.05 strength from search results.

Only Three MCP Tools

The MCP server exposes just three tools: recall_memory, store_memory, and update_memory.
It works with standard stdio MCP clients—Claude Code, Claude Desktop, Cline, Cursor, OpenCode, and others.

The README’s setup steps are lightweight.
Run pip install yourmemory, then yourmemory-setup, then yourmemory-path, and register yourmemory in your client’s MCP config.
The local DB defaults to ~/.yourmemory/memories.duckdb.
Dependencies include DuckDB, sentence-transformers, spaCy, NetworkX, and APScheduler.
PostgreSQL + pgvector and Neo4j are optional.

sample_CLAUDE.md describes using recall_memory at the start of a task, store_memory when you learn something new, and update_memory when a memory contradicts what you just discovered.
This is a different role from writing progress to files as in planning-with-files.
The file-based approach suits keeping a source of truth for plans, research, and progress.
YourMemory suits fragments you want to implicitly carry into the next session—“this user prefers this config” or “that environment had this failure.”

Working Memory, Not a RAG Replacement

Viewing YourMemory as a general RAG replacement is slightly off-target.
For stuffing in company manuals or API docs at scale, document structure and precise cat / grep can matter more, as Mintlify showed when they shifted from RAG to a virtual filesystem.
YourMemory handles user-specific, project-specific memories that emerge during agent work.

Comparing with Cloudflare Agent Memory makes the directional difference clear.
Cloudflare uses Durable Objects, Vectorize, and Workers AI as a managed memory layer, classifying memories into Facts, Events, Instructions, and Tasks.
YourMemory runs as a local MCP, narrows categories to fact, assumption, failure, and strategy, and adjusts retrieval order through forgetting curves and graphs.

If you want managed, team-shared memory, the Cloudflare model fits.
If you want your local Claude Code or Cursor to “remember the preferences and pitfalls I’ve mentioned before,” YourMemory is lighter.
Note that the license is CC BY-NC 4.0, so commercial use needs separate confirmation.

How to Kill Outdated Correctness

The scary part of this kind of memory isn’t forgetting too much—it’s weakening something that should never be forgotten.
An architecture decision you only touch once every six months but that breaks things if violated, for example.
With pure time-based decay, that memory silently disappears.

YourMemory tries to guard against this with its graph.
It creates edges to similar memories at save time, and when a memory is strongly recalled, its neighbors get boosted too.
The README explains that if neighbors in the chain are strong, a decayed memory won’t be pruned immediately.

There are still things to watch on the operational side.
failure memories decay fast, making old environment-specific errors hard to keep around.
Conversely, design principles and prohibitions need to be stored as strategy or with high importance, or they’ll fade.
And when a change clearly breaks an old assumption, it’s better to use update_memory right away rather than waiting for natural decay.

Making memory bigger has gotten easy.
The next differentiator is deciding which memories to kill and when.
YourMemory is a small local implementation that puts “forgetting” at the center of that question, and it’s worth trying.

This applies to blogging too.
When you pull in past articles via internal links, you need to judge whether they’re still useful—or you end up circulating outdated assumptions forever.
Articles with data that shifts in months, like benchmark numbers, can’t avoid freshness decay.
Keeping past content properly pruned is the same challenge whether you’re managing AI memory or a blog.

And I’m feeling this firsthand with Kana Chat, which I’m building myself.
The current Heartbeat memory uses today with 14 date-bucketed days, task_signals capped at 80 entries, and profile capped at 40—all mechanical numeric cutoffs.
After 14 days, gone. Over the cap, oldest gets pushed out.
Simple, but it’s truncation rather than decay, so information from 13 days ago and yesterday sit at the same weight in context.

The real pain point is task_signals.
Something like “I want to look into auto-generating OGP images” that I muttered two weeks ago keeps showing up as a job suggestion long after I’ve lost interest.
Having to mentally skip “I don’t need this anymore” every time suggestions come up is quietly stressful.
Same with profile—“knowledgeable about security articles” still applies, but “recently interested in Rust” might have expired after a month.

Adopting YourMemory’s per-category decay rates wholesale would be overkill, but the concept is useful.
Treat task_signals as a short-lived category like failure and decay them rapidly after about a week.
Treat profile as a long-lived category like strategy—keep it for months, but gradually weaken entries that aren’t referenced.
today is fine with the current date-based cutoff.
Diary data has dates as its identity, so deletion makes more semantic sense than decay.

One more thing I want is retrieval-frequency reinforcement.
Like YourMemory’s recall_count, boosting the strength of memories actually used during job context construction would naturally separate “knowledge that repeatedly helps” from “a thought I mentioned once.”
The current Heartbeat doesn’t have that mechanism, so even just reordering the 80 entries by usage frequency would probably improve suggestion quality.