Hands-on log of building the DEV article's PDF RAG on M1 Max 64GB, extending it with images via CLIP, and pushing through Japanese with bge-m3 + Qwen3.6 35B. Documents the modality gap, the dual inference server crash, and LLM-jp 4-8B's empty chat template silently dropping the system role.
Notes on a DEV Community article that wires up FastAPI as an OpenAI-compatible RAG API layer with llama.cpp, Chroma, and Open WebUI, plus where the architecture fits and what to watch for.
From the basics of RAG and vector databases to Mintlify's design and implementation of ChromaFs, a virtual file system that converts UNIX commands into ChromaDB queries.
A self-editing search agent with 20B parameters published by Chroma. It performs multi-hop search while dynamically pruning the context, and shows the same or higher accuracy than the Frontier model at 1/10 the cost and up to 10 times faster latency. Weights are exposed in Apache 2.0.