Tech 10 min read

NeuroValkey Agents: A Three-Agent AI Swarm Built on Valkey as the Nervous System

IkesanContents

Harish Kotra’s Building a Multi-Agent AI Swarm with Valkey as the Nervous System on DEV Community caught my eye, so I dug into the design.
As the title says, it’s a three-agent experiment that treats Valkey as the nervous system of the swarm, and the full code is up on GitHub as neurovalkey-agents.

A bit of framing first.
Over the past few months the “agent memory” problem has been approached from multiple angles — Cloudflare Agent Memory going managed, Chroma Context-1’s search agent, Compresr Context Gateway’s proxy approach.
NeuroValkey Agents is the self-hosted counterpart: “don’t rely on a platform — let Valkey handle memory, coordination, and state transitions by itself.”
Where Cloudflare Agent Memory offers an externally hosted shared store as a managed service, this one stands up Valkey on your own Docker Compose and hands it everything.

What is Valkey

First, where Valkey sits.
Valkey is a BSD-licensed key-value store that was forked under the Linux Foundation in March 2024, right after Redis switched Redis 7.4 to the dual RSALv2/SSPLv1 license.
The command set and protocol are Redis-compatible, so existing Redis clients keep working unchanged.

Two things matter for this article.
First, the official Docker image bundles modules like Valkey Search (the Redis Stack-style FT.* commands) and ValkeyJSON (JSON.* commands).
Second, Pub/Sub, Streams, Hash, and vector search all live in a single process, so you don’t have to stack separate pieces of middleware.

In short, the things you used to write on Redis Stack now run license-free on Valkey — that’s where Valkey sits in 2026.
NeuroValkey Agents is an example of leaning hard on that “batteries included” stack as a single runtime substrate.

Not a cache — a runtime substrate

What Harish Kotra (the author) emphasizes is that Valkey is used not as a cache for LLM responses, but as the runtime substrate (operational substrate) for the agent swarm.

LLMs are reasoning engines, but Valkey is the operational substrate that turns them into coordinated systems.

Agent state isn’t kept in process memory — it’s all written into the Valkey keyspace.
That has some nice side effects.

  • You can peek at live state from the outside (inspect the keyspace with KEYS / JSON.GET)
  • Sessions survive restarts
  • You can bolt on a new agent later — different process, different language — and it reads the same store
  • While debugging, you can trace what happened from both the event side and the store side

The design pattern itself isn’t new — it’s a rehash of long-standing ideas from the actor model and event sourcing.
What’s novel is staging this on a multi-agent LLM system and picking Valkey specifically as the substrate.

The three-agent pipeline

NeuroValkey Agents chains three agents in series.

OrderAgentRole
1ResearcherGenerates facts about the given topic and stores them in Valkey together with their embeddings
2WriterPulls accumulated facts out of Valkey with a KNN search and drafts the article
3EditorScores the draft, refines it, and writes out the final version

The flow is Researcher → Writer → Editor, but the wiring isn’t direct function calls — events travel through Valkey Pub/Sub channels.
When Researcher finishes, it fires an event on a channel; Writer is subscribed to that channel and starts.
Same pattern for Editor.

flowchart LR
    U[User input<br/>topic] --> R[Researcher Agent]
    R -->|store in fact:uuid hash<br/>text / topic / agent / embedding| V[(Valkey)]
    R -->|Pub/Sub: researcher_done| W[Writer Agent]
    W -->|FT.SEARCH KNN @embedding| V
    W -->|JSON.SET draft| V
    W -->|Pub/Sub: writer_done| E[Editor Agent]
    E -->|JSON.GET draft / SET final| V
    E -->|Pub/Sub: editor_done| D[Dashboard<br/>/api/state /api/logs]

Valkey is wearing four hats here at the same time.

RoleWhere it’s used
Pub/SubEvent-driven orchestration between agents
HashFact store with embeddings (fact:<uuid>)
JSONThe whole workflow’s manifest, draft, and final output
Search IndexSemantic lookup over facts

You could piece the same thing together from separate middleware, but the point is that it all fits in one process.

How Researcher writes facts in

Researcher’s job is simple: take a topic, ask the LLM to “list facts about this topic,” turn each returned piece of text into an embedding via OpenAI API, and write it into Valkey.

Embeddings are converted to a Float32 binary Buffer before HSET.
Keys look like fact:<uuid>, with four fields: text, topic, agent, and embedding.
Only embedding is binary; the rest are ordinary string tags.

This is where the Valkey Search (FT.*) index definition earns its keep.

FT.CREATE facts_idx ON HASH
  PREFIX 1 "fact:"
  SCHEMA
    topic TAG
    agent TAG
    embedding VECTOR FLAT 6
      TYPE FLOAT32
      DIM <embeddingDim>
      DISTANCE_METRIC COSINE

PREFIX 1 "fact:" declares “auto-index any hash whose key starts with fact:.”
After that, an HSET is enough to put the row into the index too.
The 6 in VECTOR FLAT 6 is just a length marker for the six option parameters that follow; the algorithm itself is FLAT (brute force).
This is a sample, so accuracy wins over speed. Once a real collection grows, the standard move is to switch to HNSW.

topic and agent are declared as TAG fields to enable filters like “KNN search scoped to a specific topic only.”
Writer uses this later as a prefilter.

Writer builds a query string from the prompt, embeds it, and sends a KNN query to Valkey Search.

FT.SEARCH facts_idx
  "*=>[KNN 8 @embedding $vec AS score]"
  PARAMS 2 vec <float32_bytes>
  DIALECT 2

The odd-looking *=>[KNN 8 ...] syntax means “out of everything (*), return the 8 nearest neighbors against @embedding.”
Swap * with something like @topic:{ai-agents} and you get a TAG prefilter before KNN kicks in.
DIALECT 2 is the incantation that enables the newer query syntax.

KNN mostly returns scores and key names, so the follow-up is HMGET fact:<id> text topic agent to pull the actual text — a two-step dance.
This is the textbook Valkey Search / RediSearch pattern: vector search on the index, real data from the original hash.

The retrieved facts are spliced into a prompt, the LLM writes a summary, and the resulting draft is written with JSON.SET swarm:<runId>:draft $ "<json>".
Editor can then just JSON.GET it.

The author notes that for vector search, he deliberately skipped the high-level SDK helpers and issues raw FT.* commands via commandClient.call().
That’s a compatibility-first choice — the thing keeps working even when the surface wrapper drifts.

How Editor scores and finalizes

Editor reads the draft with JSON.GET swarm:<runId>:draft and feeds it through its own scoring prompt.
The response comes back as a score plus refinement suggestions, and Editor may rewrite if needed.
Finally it writes the output with JSON.SET swarm:<runId>:final and fires editor_done on the channel to close out the workflow.

The manifest itself lives under a separate JSON key (swarm:<runId>:manifest), so you can trace in one place which topic was used, which facts were pulled, what the scores were, and what came out.
The Dashboard just polls this manifest — no special relay server in the middle.

The keyspace itself is what the dashboard shows

NeuroValkey Agents ships a dashboard at http://localhost:3055.
What’s interesting is that the implementation basically just polls /api/state and /api/logs on a timer.

  • Lists the keyspace map (swarm:*, fact:*)
  • Inspects the raw JSON of each key
  • Shows embedding bytes for vector facts as-is
  • Exposes index telemetry via FT.INFO
  • Correlates the event timeline and the logs side by side

Going with HTTP polling rather than SSE or WebSockets is an intentional choice — the author wanted local-demo reliability.
Latency goes up, but the infrastructure requirements drop to zero.
It’s a small trade-off, but the stance “being observable itself matters most” comes through.

This is where the “don’t keep state in the process” design pays off most visibly.
You can literally show what each agent read, wrote, and emitted — straight as keyspace shapes.
LLM apps love turning into black boxes, so “pipe the raw data to the UI” directly translates into debuggability.

Design trade-offs

Boiling the implementation down to a table:

DimensionChoiceWhat you getWhat you pay
Update mechanismHTTP pollingZero-infra, reproducibleHigher update latency
Data modelHash + JSON togetherOptimized per access patternTwo keys to keep in sync
Vector searchRaw FT.* commandsExplicit, compatibility-safeMore boilerplate
Vector algorithmFLAT (brute force)Accurate, simpleSlows down as facts grow
Agent coordinationPub/SubExplicit and easy to traceRetry/DLQ needs extra work

Once you commit to “Valkey for everything,” the only remaining design choice is the data structure.
If you already know Redis Stack, the learning curve is close to flat.

Compared to the managed approach

Lining it up against the Cloudflare Agent Memory mentioned earlier makes the positioning clear.

DimensionCloudflare Agent MemoryNeuroValkey Agents
HostingManaged (Durable Objects + Vectorize + Workers AI)Self-hosted (one Valkey process on Docker)
LicenseTied to Cloudflare’s platformValkey (BSD) + your own app code
Memory modelClassified into Facts / Events / Instructions / TasksA single fact hash type
SearchFive-channel parallel + RRF (with HyDE)Single KNN + TAG filter
State managementDurable Objects transactionsJSON key snapshots
Intended useProduction agent memory layerA reference implementation of the pattern

NeuroValkey Agents is a reference implementation for “how far can you go without a managed memory layer” — it isn’t aiming at the polished feature set of Cloudflare Agent Memory.
The flip side is that you can read it as a miniature version of what happens inside Agent Memory.
Optimizations from Cloudflare’s pipeline, like “idempotent dedupe via SHA-256” or “bridging the embedding gap between declarative statements and questions via query transformation,” aren’t there yet in NeuroValkey.
Whether or not you write those yourself is the watershed when you go the self-hosted route.

Rough edges

A few things stood out while reading the post and the repo.

ConcernDetails
No benchmark numbersThroughput, latency, and KNN accuracy have no published benchmarks. The author lists “throughput/latency benchmarking mode” as future work
Single Valkey assumedDoesn’t touch clustering or replica failover — a single node is a SPOF
No retries / DLQNothing picks back up when an agent fails after a Pub/Sub event. Switching to Streams with consumer groups would handle it, but it’s a plain channel today
Coarse fact classificationNo equivalent to Cloudflare’s Facts / Events / Instructions / Tasks split — everything lives in the same fact:* hash
Embedding model fixedHard-coded to OpenAI API. Swapping in a local model (bge-m3, etc.) is on you

The author is upfront about all of this in the README, sticking to the framing of “a reference implementation that prioritizes observability and simplicity.”