WUPHF's Karpathy-Style LLM Wiki Puts Agent Memory Back on Markdown and Git
Contents
WUPHF was trending on Hacker News.
The post was titled “Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git),” sitting at 235 points with 110 comments.
WUPHF itself is a locally-run office UI that feels like “Slack for AI employees.”
You line up Claude Code, Codex, and OpenClaw in the same channel, a broker wakes the right agent, and a web UI shows the conversation.
But what the HN thread latched onto wasn’t the chat UI—it was how to persist, search, and prevent decay in shared agent knowledge.
The Claws and Cord article touched on “persistence” as part of the orchestration layer.
WUPHF builds that out of well-worn tools: Markdown and Git.
This blog’s own CLAUDE.md follows the same idea—Markdown in a Git repo as agent memory—but a single file bloats fast.
WUPHF tries to push past that limit with wiki structure, a fact log, and indexing.
Markdown Is the Source of Truth; SQLite and Bleve Are Rebuildable
The README’s “Memory: Notebooks and the Wiki” section describes each agent having its own notebook, with the whole team sharing a wiki.
On a fresh install the markdown backend is the default, and the actual data lives in ~/.wuphf/wiki/—a local Git repository.
cat, grep, git log, and git clone just work.
The wiki’s contents are Markdown briefs and a JSONL fact log, with SQLite and Bleve layering indexes on top.
flowchart TD
A[Agent notebook<br/>observations and hypotheses] --> B{Worth keeping?}
B -->|Yes| C[Promote to wiki]
C --> D[Markdown brief<br/>human-readable article]
C --> E[JSONL fact log<br/>append-only fact stream]
D --> F[(Git repo<br/>~/.wuphf/wiki)]
E --> F
F --> G[SQLite<br/>structured index]
F --> H[Bleve<br/>BM25 search]
G --> I[/lookup<br/>cited answer]
H --> I
I --> J[Other agents reference it]
SQLite and Bleve are not the source of truth.
docs/specs/WIKI-SCHEMA.md states that running rm -rf .wuphf/index/ and restarting should rebuild SQLite and Bleve from Markdown, producing logically identical results.
The search index is a cache. The source of truth is the Git-managed Markdown and JSONL fact log.
This trade-off is quietly significant.
Agent memory systems tend to make “which vector DB holds it” or “which session ID it’s tied to” the actual substance.
In WUPHF, if it breaks, you throw away the index and rebuild.
Want history? git log.
Want to fork? Take the whole Git repo.
As experienced with the CLAUDE.md splitting problem, splitting Markdown into more files makes agents lose track of the information they need.
WUPHF has SQLite and Bleve indexes to handle that.
Since the index can always be rebuilt from Markdown, file dispersion and index aggregation coexist.
Cloudflare Agent Memory built a managed memory layer with Durable Objects, Vectorize, and Workers AI.
Cloudflare itself also announced Artifacts on the last day of Agents Week—a Git-compatible storage for agents.
WUPHF goes the opposite direction, placing the ultimate trust boundary on files and Git.
Artifacts is cloud-managed on Durable Objects; WUPHF is a plain Git repository in ~/.wuphf/wiki/.
The difference is whether you entrust memory to a platform, or audit the memory agents write as an ordinary repository.
Promotion From Notebook to Wiki
The best part of the README is that the promotion flow is not automatic.
Agents first write working context, observations, and tentative conclusions in a notebook.
Only durable information—reusable playbooks, verified entity facts, confirmed preferences—gets promoted to the wiki.
The README states explicitly: “Nothing is promoted automatically.”
The question of “what to keep and what to discard” arises at every scale.
Kana Chat v2’s Heartbeat memory extracts profile, daily activity, and task signals from conversations and passes them as job context.
That was small-scale memory between one user and one agent, with caps on how much accumulates.
WUPHF scales that to a team-wide shared wiki, so promotion accuracy matters a lot more.
The HN comments hit on this too.
”Garbage facts in, garbage briefs out”—the concern that bad facts produce bad briefs.
Some suggested splitting capture and promotion layers, requiring human review or multi-agent convergence before granting trusted status.
The concern is valid.
Agent-written notes look plausible at first.
But six months later, when another agent cites a stale memory and that citation gets folded into a new brief, the entire knowledge base quietly degrades.
WUPHF has a lint design that detects contradictions, orphans, stale claims, and broken cross-references—but “correct at the time, now outdated” and “plausible but weakly supported” slip through.
The article on AI agent memory injection attacks covered adversaries poisoning memory.
WUPHF’s problem is more mundane: even without an attacker, sloppy promotions by agents themselves pollute the knowledge base.
The failure mode runs both ways.
AI coding tool auto-compaction automatically compresses context, sometimes discarding important memories along the way.
AGENTS.md validation research showed that cramming more information into context files actually reduces task success rates.
Too little or too much—both break things.
WUPHF’s promotion flow is a bet on letting agents themselves decide what to keep and what to drop.
BM25 First Is Pragmatic
The HN thread also reacted to the BM25-first design.
”Most teams jump to a vector DB before measuring anything”—understandable.
LLM wiki search isn’t a matter of shoving everything into vector search.
Short lookups like “Acme’s billing contact,” “pricing policy,” or “deployment checklist” are fast and strong with BM25 when the vocabulary matches.
Conversely, questions like “given last month’s discussion, what risky assumptions exist for this client” need more than keyword matching—they need cited synthesis.
WUPHF’s README describes /lookup with cited-answer retrieval, and the schema documents a rebuild contract for SQLite and Bleve.
Rather than collapsing short searches and cited answers into a single “search,” it separates a light path and a heavy path.
The idea is close to Compresr Context Gateway’s tool discovery.
Handing agents every tool and all context inflates both tokens and decision complexity.
Short queries return via a short path; only long synthesis goes through the expensive one.
BM25-first looks old-fashioned, but for an agent’s working memory it’s a sensible default.
PageIndex also built a document tree index using only LLM reasoning, no vector DB.
Mintlify ditched RAG for a virtual filesystem called ChromaFs, layering a UNIX-command-style interface on top of ChromaDB so you pull documents with grep and cat.
The pushback against “vectorize everything” is showing up all over.
That said, WUPHF gets away with BM25 partly because the wiki’s article structure is well-defined.
The conditions differ from dumping large volumes of unstructured documents.
Wikipedia-Style UI Makes Memory Auditable
DESIGN-WIKI.md is opinionated.
It designs the WUPHF wiki’s look as “Wikipedia but for my company,” adopting serif body text, hatnotes, infoboxes, TOC, Sources, Categories, and page footers—all following Wikipedia’s information architecture.
The intent isn’t retro nostalgia.
It’s about getting people to read what agents wrote as “reference material” rather than “chat logs.”
Agent memory tends to get its UI neglected: it’s in a DB, retrievable via API, injectable into a prompt, and that’s it.
But memory that humans can’t read can’t be audited.
Memory that can’t be audited eventually makes it impossible to trace why a decision was made.
NeuroValkey Agents had a nice design exposing Valkey’s keyspace in a dashboard.
WUPHF takes that a step more human-friendly with Git history and a wiki UI.
If NeuroValkey is developer-facing internal state visualization, WUPHF aims for readability as team-shared knowledge.
Similar to Local RAG, but Different in Purpose
Markdown + Git + search running locally sounds like the same lineage as Obsidian vaults, TiddlyWiki, open-notebook, or NotebookLM clones.
HN had an “Obsidian vault with a plugin, no?” comment too.
Close, but WUPHF’s center of gravity isn’t a human note-taking app—it’s a shared workspace for multiple agents.
It’s not RAG where a human feeds in documents and asks questions.
Agents place facts they discover during work into notebooks, promote durable ones to the wiki, and other agents reference them.
In the article on running open-notebook as local RAG on M1 Max, the setup was feeding existing articles as sources and having qwen3.6:35b answer questions.
That was a “read existing material” system.
WUPHF wiki is a “grow the material while working” system—the authoring entity is different.
The problem here goes beyond search accuracy.
Who wrote it, what input it was extracted from, when it was true, how to invalidate stale facts, how to handle contradictions.
That’s why WUPHF’s schema goes deep into fact IDs, temporal validity, contradiction lint, redirects, and slug immutability.
Open Questions
There are boundaries worth examining before committing.
| Aspect | Where to look | Why |
|---|---|---|
| Wiki source-of-truth guarantee | docs/specs/WIKI-SCHEMA.md and internal/team/wiki_* | Verify that Markdown and JSONL truly rebuild identically |
| Promotion boundary | MCP tools that promote from notebook to wiki | Which information enters shared memory determines quality |
| Search paths | Bleve/SQLite index rebuild and /lookup | How short lookups and cited answers split determines real-world speed |
The promotion boundary in particular is hard to judge from demos alone.
The README says “Nothing is promoted automatically,” but if agents make their own promotion decisions, quality ultimately depends on those decision criteria.
Is human review mandatory? Are only low-risk playbooks auto-promoted? Do you wait for multi-agent consensus?
The answers vary by team risk tolerance.
Since promotion is implemented via MCP tools, the tools’ own safety matters too.
A security scan of 50 open-source MCP servers found input validation gaps in 61% of them.
If the promotion tools have weak validation, unintended data can enter the wiki regardless of decision-criteria quality.
Another concern is running without GitHub or any cloud.
The README emphasizes that the wiki is a portable local Git repo, but teams will want a sync target.
At that point the question becomes where to put a wiki containing customer names, internal procedures, sales notes, and auth fragments.
An HN comment noted, “Putting sensitive business documents on GitHub is scary.”
WUPHF is pre-1.0, and the README itself says to pin to release tags because main moves daily.
This isn’t something to dump your entire company’s memory into right now.
It’s at the stage of building a small team wiki locally and seeing what granularity agents use to record facts.
But the direction is promising.
Trying to solve agent memory with only “long context,” “vector DB,” and “conversation summary” keeps adding layers humans can’t read back later.
WUPHF brings it back to well-worn parts: Markdown, Git, BM25, SQLite, lint, and a Wikipedia-style UI.
More interesting than the “Karpathy-style” label is that quiet return to basics.