NeuroValkey Agents: A Three-Agent AI Swarm Built on Valkey as the Nervous System
Contents
Harish Kotra’s Building a Multi-Agent AI Swarm with Valkey as the Nervous System on DEV Community caught my eye, so I dug into the design.
As the title says, it’s a three-agent experiment that treats Valkey as the nervous system of the swarm, and the full code is up on GitHub as neurovalkey-agents.
A bit of framing first.
Over the past few months the “agent memory” problem has been approached from multiple angles — Cloudflare Agent Memory going managed, Chroma Context-1’s search agent, Compresr Context Gateway’s proxy approach.
NeuroValkey Agents is the self-hosted counterpart: “don’t rely on a platform — let Valkey handle memory, coordination, and state transitions by itself.”
Where Cloudflare Agent Memory offers an externally hosted shared store as a managed service, this one stands up Valkey on your own Docker Compose and hands it everything.
What is Valkey
First, where Valkey sits.
Valkey is a BSD-licensed key-value store that was forked under the Linux Foundation in March 2024, right after Redis switched Redis 7.4 to the dual RSALv2/SSPLv1 license.
The command set and protocol are Redis-compatible, so existing Redis clients keep working unchanged.
Two things matter for this article.
First, the official Docker image bundles modules like Valkey Search (the Redis Stack-style FT.* commands) and ValkeyJSON (JSON.* commands).
Second, Pub/Sub, Streams, Hash, and vector search all live in a single process, so you don’t have to stack separate pieces of middleware.
In short, the things you used to write on Redis Stack now run license-free on Valkey — that’s where Valkey sits in 2026.
NeuroValkey Agents is an example of leaning hard on that “batteries included” stack as a single runtime substrate.
Not a cache — a runtime substrate
What Harish Kotra (the author) emphasizes is that Valkey is used not as a cache for LLM responses, but as the runtime substrate (operational substrate) for the agent swarm.
LLMs are reasoning engines, but Valkey is the operational substrate that turns them into coordinated systems.
Agent state isn’t kept in process memory — it’s all written into the Valkey keyspace.
That has some nice side effects.
- You can peek at live state from the outside (inspect the keyspace with
KEYS/JSON.GET) - Sessions survive restarts
- You can bolt on a new agent later — different process, different language — and it reads the same store
- While debugging, you can trace what happened from both the event side and the store side
The design pattern itself isn’t new — it’s a rehash of long-standing ideas from the actor model and event sourcing.
What’s novel is staging this on a multi-agent LLM system and picking Valkey specifically as the substrate.
The three-agent pipeline
NeuroValkey Agents chains three agents in series.
| Order | Agent | Role |
|---|---|---|
| 1 | Researcher | Generates facts about the given topic and stores them in Valkey together with their embeddings |
| 2 | Writer | Pulls accumulated facts out of Valkey with a KNN search and drafts the article |
| 3 | Editor | Scores the draft, refines it, and writes out the final version |
The flow is Researcher → Writer → Editor, but the wiring isn’t direct function calls — events travel through Valkey Pub/Sub channels.
When Researcher finishes, it fires an event on a channel; Writer is subscribed to that channel and starts.
Same pattern for Editor.
flowchart LR
U[User input<br/>topic] --> R[Researcher Agent]
R -->|store in fact:uuid hash<br/>text / topic / agent / embedding| V[(Valkey)]
R -->|Pub/Sub: researcher_done| W[Writer Agent]
W -->|FT.SEARCH KNN @embedding| V
W -->|JSON.SET draft| V
W -->|Pub/Sub: writer_done| E[Editor Agent]
E -->|JSON.GET draft / SET final| V
E -->|Pub/Sub: editor_done| D[Dashboard<br/>/api/state /api/logs]
Valkey is wearing four hats here at the same time.
| Role | Where it’s used |
|---|---|
| Pub/Sub | Event-driven orchestration between agents |
| Hash | Fact store with embeddings (fact:<uuid>) |
| JSON | The whole workflow’s manifest, draft, and final output |
| Search Index | Semantic lookup over facts |
You could piece the same thing together from separate middleware, but the point is that it all fits in one process.
How Researcher writes facts in
Researcher’s job is simple: take a topic, ask the LLM to “list facts about this topic,” turn each returned piece of text into an embedding via OpenAI API, and write it into Valkey.
Embeddings are converted to a Float32 binary Buffer before HSET.
Keys look like fact:<uuid>, with four fields: text, topic, agent, and embedding.
Only embedding is binary; the rest are ordinary string tags.
This is where the Valkey Search (FT.*) index definition earns its keep.
FT.CREATE facts_idx ON HASH
PREFIX 1 "fact:"
SCHEMA
topic TAG
agent TAG
embedding VECTOR FLAT 6
TYPE FLOAT32
DIM <embeddingDim>
DISTANCE_METRIC COSINE
PREFIX 1 "fact:" declares “auto-index any hash whose key starts with fact:.”
After that, an HSET is enough to put the row into the index too.
The 6 in VECTOR FLAT 6 is just a length marker for the six option parameters that follow; the algorithm itself is FLAT (brute force).
This is a sample, so accuracy wins over speed. Once a real collection grows, the standard move is to switch to HNSW.
topic and agent are declared as TAG fields to enable filters like “KNN search scoped to a specific topic only.”
Writer uses this later as a prefilter.
How Writer drafts with a KNN search
Writer builds a query string from the prompt, embeds it, and sends a KNN query to Valkey Search.
FT.SEARCH facts_idx
"*=>[KNN 8 @embedding $vec AS score]"
PARAMS 2 vec <float32_bytes>
DIALECT 2
The odd-looking *=>[KNN 8 ...] syntax means “out of everything (*), return the 8 nearest neighbors against @embedding.”
Swap * with something like @topic:{ai-agents} and you get a TAG prefilter before KNN kicks in.
DIALECT 2 is the incantation that enables the newer query syntax.
KNN mostly returns scores and key names, so the follow-up is HMGET fact:<id> text topic agent to pull the actual text — a two-step dance.
This is the textbook Valkey Search / RediSearch pattern: vector search on the index, real data from the original hash.
The retrieved facts are spliced into a prompt, the LLM writes a summary, and the resulting draft is written with JSON.SET swarm:<runId>:draft $ "<json>".
Editor can then just JSON.GET it.
The author notes that for vector search, he deliberately skipped the high-level SDK helpers and issues raw FT.* commands via commandClient.call().
That’s a compatibility-first choice — the thing keeps working even when the surface wrapper drifts.
How Editor scores and finalizes
Editor reads the draft with JSON.GET swarm:<runId>:draft and feeds it through its own scoring prompt.
The response comes back as a score plus refinement suggestions, and Editor may rewrite if needed.
Finally it writes the output with JSON.SET swarm:<runId>:final and fires editor_done on the channel to close out the workflow.
The manifest itself lives under a separate JSON key (swarm:<runId>:manifest), so you can trace in one place which topic was used, which facts were pulled, what the scores were, and what came out.
The Dashboard just polls this manifest — no special relay server in the middle.
The keyspace itself is what the dashboard shows
NeuroValkey Agents ships a dashboard at http://localhost:3055.
What’s interesting is that the implementation basically just polls /api/state and /api/logs on a timer.
- Lists the keyspace map (
swarm:*,fact:*) - Inspects the raw JSON of each key
- Shows embedding bytes for vector facts as-is
- Exposes index telemetry via
FT.INFO - Correlates the event timeline and the logs side by side
Going with HTTP polling rather than SSE or WebSockets is an intentional choice — the author wanted local-demo reliability.
Latency goes up, but the infrastructure requirements drop to zero.
It’s a small trade-off, but the stance “being observable itself matters most” comes through.
This is where the “don’t keep state in the process” design pays off most visibly.
You can literally show what each agent read, wrote, and emitted — straight as keyspace shapes.
LLM apps love turning into black boxes, so “pipe the raw data to the UI” directly translates into debuggability.
Design trade-offs
Boiling the implementation down to a table:
| Dimension | Choice | What you get | What you pay |
|---|---|---|---|
| Update mechanism | HTTP polling | Zero-infra, reproducible | Higher update latency |
| Data model | Hash + JSON together | Optimized per access pattern | Two keys to keep in sync |
| Vector search | Raw FT.* commands | Explicit, compatibility-safe | More boilerplate |
| Vector algorithm | FLAT (brute force) | Accurate, simple | Slows down as facts grow |
| Agent coordination | Pub/Sub | Explicit and easy to trace | Retry/DLQ needs extra work |
Once you commit to “Valkey for everything,” the only remaining design choice is the data structure.
If you already know Redis Stack, the learning curve is close to flat.
Compared to the managed approach
Lining it up against the Cloudflare Agent Memory mentioned earlier makes the positioning clear.
| Dimension | Cloudflare Agent Memory | NeuroValkey Agents |
|---|---|---|
| Hosting | Managed (Durable Objects + Vectorize + Workers AI) | Self-hosted (one Valkey process on Docker) |
| License | Tied to Cloudflare’s platform | Valkey (BSD) + your own app code |
| Memory model | Classified into Facts / Events / Instructions / Tasks | A single fact hash type |
| Search | Five-channel parallel + RRF (with HyDE) | Single KNN + TAG filter |
| State management | Durable Objects transactions | JSON key snapshots |
| Intended use | Production agent memory layer | A reference implementation of the pattern |
NeuroValkey Agents is a reference implementation for “how far can you go without a managed memory layer” — it isn’t aiming at the polished feature set of Cloudflare Agent Memory.
The flip side is that you can read it as a miniature version of what happens inside Agent Memory.
Optimizations from Cloudflare’s pipeline, like “idempotent dedupe via SHA-256” or “bridging the embedding gap between declarative statements and questions via query transformation,” aren’t there yet in NeuroValkey.
Whether or not you write those yourself is the watershed when you go the self-hosted route.
Rough edges
A few things stood out while reading the post and the repo.
| Concern | Details |
|---|---|
| No benchmark numbers | Throughput, latency, and KNN accuracy have no published benchmarks. The author lists “throughput/latency benchmarking mode” as future work |
| Single Valkey assumed | Doesn’t touch clustering or replica failover — a single node is a SPOF |
| No retries / DLQ | Nothing picks back up when an agent fails after a Pub/Sub event. Switching to Streams with consumer groups would handle it, but it’s a plain channel today |
| Coarse fact classification | No equivalent to Cloudflare’s Facts / Events / Instructions / Tasks split — everything lives in the same fact:* hash |
| Embedding model fixed | Hard-coded to OpenAI API. Swapping in a local model (bge-m3, etc.) is on you |
The author is upfront about all of this in the README, sticking to the framing of “a reference implementation that prioritizes observability and simplicity.”