Mintlify ditched RAG and switched to a virtual file system

RAG has become a standard approach. You split documents into chunks, store them in a vector database, and pull chunks similar to the user’s query into the LLM context. In practice, people keep finding ways to improve precision, such as Chroma’s search agent Context-1 and PageIndex’s tree RAG.

Mintlify, however, abandoned the RAG paradigm itself. Instead, it built ChromaFs, a virtual file system. You give an AI agent a bash shell and let it explore documentation with grep and cat. Every file-system access is translated into a ChromaDB query.

This article walks back through the basics of RAG and digs into what ChromaFs is actually solving.

What RAG Actually Is

RAG, or Retrieval-Augmented Generation, is a method for supplementing an LLM’s knowledge with external data in real time. In 2020, Meta, then Facebook AI Research, proposed it in the paper “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.”

An LLM only knows what it saw during training. If you ask it about internal documents or the latest product manual, it can hallucinate plausible-sounding nonsense. RAG solves that problem with a simple idea: retrieve documents related to the question and attach them to the prompt.

graph TD
    A["User question"] --> B["Vectorize with an embedding model"]
    B --> C["Similar-search in the vector DB"]
    C --> D["Fetch top-k chunks"]
    D --> E["Send question + chunks to the LLM"]
    E --> F["Generate answer"]

Broken down, the flow looks like this:

Split documents into chunks of a few hundred tokens each
Vectorize each chunk with an embedding model such as OpenAI’s text-embedding-3-small
Store the vectors in a vector DB such as Chroma, Pinecone, or Weaviate
Vectorize the user’s question and retrieve nearby chunks with cosine similarity
Include the retrieved chunks in the prompt and send them to the LLM

The reason this “search and pass along” mechanism was so powerful is that it could reflect the latest information without fine-tuning the LLM. When documentation changes, you only need to update the vector DB.

Embeddings and Cosine Similarity

The core of RAG is the embedding. It converts text into a numeric vector with hundreds or thousands of dimensions, and texts with similar meanings end up close together.

For example, “How to sort a list in Python” and “How do I reorder an array in Python?” are different strings, but their embedding vectors are very close. By contrast, “How to sort a list in Python” and “Python is a snake native to South America” may look similar as strings, but their vectors are far apart.

Cosine similarity measures how close two vectors are. It calculates the cosine of the angle between them. The closer to 1, the more similar; the closer to 0, the less related; and the closer to -1, the more opposite the meaning.

similarity = cos(θ) = (A · B) / (|A| × |B|)

RAG relies on the assumption that “the chunk closest to the question vector probably contains the information needed to answer the question.” That assumption works well in many cases, but not always. That is the structural limit discussed later.

Vector DBs, KVSs, and RDBMSs

Vector DBs show up naturally in RAG discussions, but why not use an existing database? Vector DBs, KVSs, and RDBMSs are fundamentally optimized for different kinds of queries.

Property	RDBMS	KVS	Vector DB
Data model	Tables (rows and columns)	Key-value pairs	Vectors + metadata
Main query style	SQL (filters, joins)	Exact key lookup	Nearest-neighbor similarity search
Query example	`WHERE category = 'API'`	`GET doc:12345`	”chunks similar to authentication setup”
Indexes	B-tree, hash	Hash table	ANN structures such as HNSW and IVF
Scalability	Vertical-first	Horizontal-friendly	Horizontally scalable
Typical implementations	PostgreSQL, MySQL	Redis, DynamoDB	Chroma, Pinecone, Weaviate
Best fit	Structured data and complex queries	Sessions and cache	Semantic similarity search

Why RDBMSs Alone Are Not Enough for RAG

RDBMSs can do full-text search too, for example with LIKE '%authentication%' or tsvector. But that is keyword matching, not semantic similarity. Searching for “authentication setup” will not necessarily surface a page titled “initial login procedure.”

PostgreSQL has pgvector, an extension that adds vector columns and cosine-similarity indexing. That makes vector search possible inside an RDBMS, but it can still lag dedicated vector DBs in index efficiency and scalability. For small to medium projects, adding pgvector to an existing PostgreSQL instance is often the simplest choice.

Vector DB Index Structures: HNSW

Vector DBs can do fast approximate nearest-neighbor search because they use specialized index structures. The most widely adopted one is HNSW, which Chroma also uses by default.

graph TD
    A["Search query vector"] --> B["Layer 2 (sparse graph)<br/>Locate the rough area with only a few nodes"]
    B --> C["Layer 1 (middle graph)<br/>Narrow the range"]
    C --> D["Layer 0 (dense graph)<br/>Search nearby nodes precisely"]
    D --> E["Return the top-k nearest neighbors"]

HNSW is a multilayer graph. Upper layers have fewer nodes and long-range edges; lower layers are denser and have shorter edges. At query time, the search starts at the top layer, follows nodes close to the query vector, and descends layer by layer. The closest nodes in the bottom layer become the result set.

That structure avoids computing distances to every vector, so approximate nearest neighbors can be found quickly. Even with a million vectors, search can complete in milliseconds. The tradeoff is that it is approximate, so there is no guarantee that the exact nearest neighbor will be returned.

What KVS Is Used For

Redis also appears in ChromaFs’ architecture. The KVS is good at fast key-based reads and writes, and at caching. After the vector DB narrows down the candidate files, the actual chunk data is fetched from Redis. In other words, the vector DB handles search and the KVS handles retrieval.

The Structural Limits of RAG

Now we get to Mintlify’s problem. Mintlify is a documentation platform used by many companies, including Discord, Vercel, and Cursor. It powers an AI assistant that handles 850,000 conversations per month, but conventional RAG has structural limits.

The Weakness of Single-Pass Search

The standard RAG pipeline is single-pass: retrieve relevant chunks once and stop. That breaks in a few common cases.

Case	Problem
The answer spans multiple pages	Vector search returns chunks similar to a single query. For comparative questions like “What is the difference between A and B?”, you may need both A and B, but only get one side
Exact values matter	Exact API signatures and configuration parameters are hard to capture with vector similarity. A query like “default value of the third argument of `createUser`” might return a semantically similar but wrong API description
Surrounding context matters	Chunking can destroy page context. If a chunk only makes sense after reading the previous section, the LLM may still misread it

This is a long-known issue in the structural weakness of vector-search RAG. Many improvements, such as multi-hop search and reranking, have been proposed, but they add pipeline complexity.

Mintlify’s Sandbox Approach

Mintlify’s earlier approach used a sandbox: it launched a container for each conversation, mounted the documents, and let the agent explore freely with bash. Accuracy was high, but the P90 boot time was about 46 seconds and the cost per conversation was $0.0137. At 850,000 conversations per month, that works out to more than $70,000 in annual infrastructure cost.

graph TD
    A["Conversation request"] --> B["Start container<br/>(P90: about 46 seconds)"]
    B --> C["Mount documents into the file system"]
    C --> D["Let the agent explore freely with bash"]
    D --> E["Generate answer"]
    E --> F["Destroy the container"]

The accuracy was genuinely good. The agent could inspect directory structure, read files, and use grep to search across docs, just like a human. The problem was cost and latency.

ChromaFs Architecture

ChromaFs is designed to balance sandbox-level accuracy with RAG-level cost efficiency. It removes the need to start containers while still giving the agent the ability to explore with bash.

ChromaFs is built on just-bash, developed by Vercel Labs. just-bash is a TypeScript reimplementation of bash. It rewrites UNIX commands like grep, cat, ls, find, and cd from scratch so they work in a browser environment. It is not a wrapper around a real shell. The parser and command execution are both written in TypeScript.

just-bash provides a pluggable file system interface called IFileSystem. ChromaFs implements that interface and translates file-system operations into ChromaDB queries.

graph TD
    A["Agent"] --> B["bash commands<br/>ls, cat, grep, etc."]
    B --> C["just-bash<br/>TypeScript bash emulator"]
    C --> D["IFileSystem interface"]
    D --> E["ChromaFs<br/>Virtual file system"]
    E --> F["Chroma DB"]
    E --> G["Redis<br/>Chunk cache"]

The agent thinks it is issuing normal bash commands, but underneath, it is really querying a vector database.

Initializing the Directory Tree

The file structure for the entire documentation set is stored in the Chroma collection as gzip-compressed JSON (__path_tree__). At startup, ChromaFs expands it into two in-memory structures.

Data structure	Contents	Purpose
`Set<string>`	Set of all file paths	Check whether a path exists (`O(1)`)
`Map<string, string[]>`	Mapping from directories to child entries	Respond to `ls`

On the second and later sessions, the cached tree is used, so there are zero network calls. Because gzip compression keeps the tree small, even a documentation set with thousands of pages can be expanded in a few milliseconds.

Reconstructing Pages

Documents are split into chunks when stored in Chroma. When cat reads a page, ChromaFs fetches all chunks matching the page slug and concatenates them in chunk_index order.

The key point is that the vector DB acts as the backend of a file system. In ordinary RAG, chunks are retrieved by vector similarity. In ChromaFs’ cat, chunks are fetched by a metadata query (WHERE slug = 'target-page'). No vectors are used at all. The goal is to reconstruct the file contents exactly, so similarity search is unnecessary.

Repeated access to the same page is absorbed by the cache. By keeping prefetched chunks in Redis, the second and later cat commands can answer without touching the DB.

Two-Stage `grep`

Recursive search with grep -r would otherwise scan the entire documentation set. ChromaFs handles it in two stages.

graph TD
    A["grep -r 'pattern' /docs"] --> B["Stage 1: Coarse filter<br/>Use a Chroma metadata query to narrow the candidate files"]
    B --> C["Candidate files"]
    C --> D["Stage 2: Fine filter<br/>Run regex matching on chunks in Redis"]
    D --> E["Search results<br/>Line number + matching line"]

Stage 1 uses a Chroma metadata query. Rather than vector similarity, it checks whether the chunk contents contain the search pattern, and that greatly reduces the candidate file set.

Stage 2 performs in-memory regex matching on the chunks already prefetched into Redis. After reconstructing the actual file contents, it returns search results with line numbers.

By splitting the work this way, recursive search across large docs sets, even thousands of pages, can still finish in milliseconds. Unlike a linear file-system scan, the index-based narrowing in stage 1 does the heavy lifting.

Write Operations Are Forbidden

All write operations return an EROFS (Read-Only File System) error. This is intentional.

State does not leak across sessions, so one agent cannot break another session
The source of truth for the documentation is the data inside Chroma DB, and changing it through the virtual file system could break consistency
Read-only behavior makes cache invalidation much simpler

Performance Comparison

The difference from the sandbox approach is dramatic.

Metric	Sandbox	ChromaFs
P90 boot time	About 46 seconds	About 100 milliseconds
Marginal cost per conversation	$0.0137	$0 (reuse existing DB)
Search method	Linear disk scan	DB metadata query
Annual infrastructure cost at 850,000 conversations/month	More than $70,000	Effectively $0

Boot time became 460 times faster, and marginal cost fell to zero. That matters at the scale of more than 30,000 conversations per day.

The reason the cost becomes zero is that ChromaFs reuses an existing Chroma DB. The sandbox approach created a new container and file system for each conversation, but ChromaFs has every session query the same Chroma collection. There is no additional infrastructure cost.

Access Control

Documentation platforms need different users to see different pages. In ChromaFs, each path-tree entry carries isPublic and groups fields.

graph TD
    A["Session starts"] --> B["Read user permissions from the session token"]
    B --> C["Exclude unauthorized paths while building the tree"]
    C --> D["File system visible to the agent"]
    D --> E["ls: only authorized files"]
    D --> F["grep: only authorized chunks"]
    D --> G["cat: unauthorized files return ENOENT"]

When building the tree, ChromaFs uses the session token to prune paths the user cannot access, and then applies the same filter to later Chroma queries.

An unauthorized file is treated as “does not exist” rather than “access denied.” It does not show up in ls, and grep does not find it. In the sandbox approach, the file system had to be mounted again for each conversation, but ChromaFs filters at the query layer, so container startup is unnecessary.

This “it does not exist” approach also makes security sense. Returning access denied would reveal that the file exists. Hiding existence itself minimizes the risk of information leakage.

just-bash Design Principles

Vercel Labs’ just-bash is the foundation of ChromaFs, and it is a collection of interesting design decisions on its own.

Unlike a normal shell such as bash or zsh, just-bash has the following properties:

Property	Normal bash	just-bash
Runtime	Process on the OS	TypeScript runtime
Shell state	Preserved across commands	Reset on every `exec()`
File system	Provided by the OS	Pluggable (`IFileSystem`)
Network	Allowed by default	Disabled by default
Execution model	fork + exec	Function calls

Resetting state on every exec() is clearly designed with AI agents in mind. Agent tool calls are independent, and it is safer if they do not rely on side effects from previous commands such as environment changes or cd.

Disabling network access by default is also a crucial sandbox property. Only explicitly whitelisted endpoints with URL prefixes can be accessed. That reduces the risk of prompt injection causing the agent to make unintended external calls.

The biggest advantage of just-bash for Mintlify is its pluggable file system. By swapping the IFileSystem implementation, Mintlify can point the file system the agent sees at Chroma DB. Because just-bash still exposes the @vercel/sandbox-compatible API, the migration cost from the existing sandbox-based code is low.

The Difference Between RAG and Tool Use

The essence of ChromaFs is the difference between “RAG” and “tool use.”

graph LR
    subgraph "RAG paradigm"
        R1["Query"] --> R2["Search"]
        R2 --> R3["Retrieve chunks"]
        R3 --> R4["Pass to LLM"]
        R4 --> R5["Answer"]
    end
    subgraph "Tool-use paradigm"
        T1["Question"] --> T2["Use ls to understand structure"]
        T2 --> T3["Use cat to read"]
        T3 --> T4["Use grep across files"]
        T4 --> T5["Additional cat"]
        T5 --> T6["Answer"]
    end

In RAG, you have to decide what to search for when you issue the query. It is a single pass, so if the result is insufficient, there is no real chance to recover. There are agentic multi-hop RAG systems too, but they mostly just loop through search, retrieval, and search again. The freedom to explore is still limited.

In the tool-use paradigm, the agent builds its own exploration strategy. It uses ls to inspect the directory tree, cat to read files and deepen its understanding, and then uses that evidence to decide the next grep. That is exactly how a human would look for documentation.

You can also phrase the difference as “search” versus “exploration.”

Item	RAG (search)	ChromaFs (exploration)
Strategy flexibility	Fixed at query time	Can change mid-flight
Referencing multiple pages	Depends on chunk limit	Freely read multiple files
Structural awareness	None, just flat chunks	Directory structure is visible
Exact value retrieval	Depends on vector similarity	Exact-match search with `grep`
Number of LLM calls	One	Multiple, depending on tool calls
Latency	Low, one pass	Slightly higher, multiple turns