#RAG

19 articles

TechMay 11, 202610 min

Gemini API multimodal File Search as game NPC memory: metadata filters, store tiers, and a cost estimate

Gemini API File Search now indexes images alongside text in the same store. Metadata filters can isolate NPC memories by chapter and character, and a single-character prototype costs under $1/month on Flash-Lite. Notes on tier limits, pricing breakdown, and what to test first.

AI Gemini RAG API Game

TechMay 8, 20267 min

How CivicSurvival kept 158K lines of AI-written C# honest with CivicRAG and 300 Roslyn analyzers

158K lines of AI-generated C# for a Cities: Skylines II total conversion mod. CivicRAG for codebase indexing, 300+ custom Roslyn analyzers as compile-time design rules, and manual visual debugging for render bugs AI couldn't see.

AI AI Agents Claude Code MCP RAG Game

TechMay 8, 202613 min

Vektor Memory supersession chains: BM25 threshold trap and a minimum schema

Vektor Memory v1.5.4 supersession chains positioned against YourMemory decay, Cloudflare key-overwrite, and CTX, with a BM25 vs cosine threshold trap and a 5-field minimum schema for agent memory.

AI AIエージェント RAG MCP トークン管理 Node.js

TechMay 7, 202611 min

Agent memory is just lookup: reading arXiv:2604.27707 with CTX and OCR-Memory in mind

The paper argues that RAG, vector stores, and scratchpads are retrieval, not learning. Read alongside CTX and OCR-Memory, the gap between 'better search' and 'weight-level learning' becomes concrete.

AI AIエージェント RAG トークン管理 AIセーフティ論文

TechMay 3, 202613 min

Adding Working Memory to Claude Code with CTX

A read of CTX, which auto-injects context into Claude Code via the UserPromptSubmit hook. Compared with auto-memory, YourMemory, WUPHF, and Cloudflare Agent Memory on persistence and storage. Also looked at why 1M context still isn't enough and how each agent architecture uses its window differently.

Claude Code AIエージェントトークン管理 RAG OSS

TechMay 2, 202623 min

Wiring Up a Multimodal Japanese Local RAG with FastAPI, Chroma, Open WebUI, and Ollama on M1 Max

Hands-on log of building the DEV article's PDF RAG on M1 Max 64GB, extending it with images via CLIP, and pushing through Japanese with bge-m3 + Qwen3.6 35B. Documents the modality gap, the dual inference server crash, and LLM-jp 4-8B's empty chat template silently dropping the system role.

AI LLM RAG ローカルLLM FastAPI llama.cpp Chroma Python Apple Silicon Ollama 日本語LLM 実験

TechMay 2, 2026updated12 min

Reading an Article on Building a Local PDF RAG with FastAPI, llama.cpp, Chroma, and Open WebUI

Notes on a DEV Community article that wires up FastAPI as an OpenAI-compatible RAG API layer with llama.cpp, Chroma, and Open WebUI, plus where the architecture fits and what to watch for.

AI LLM RAG ローカルLLM FastAPI llama.cpp Chroma Python Docker

TechMay 2, 202614 min

OCR-Memory Lets Agents Recall History as Images

A read of arXiv:2604.26622 OCR-Memory. It renders agent execution history into images, uses Set-of-Mark to let a VLM pick relevant segments, then retrieves verbatim text from the original logs.

AI AIエージェント OCR VLM RAG トークン管理論文

TechApr 30, 20267 min

VecLite brings Rust/WASM vector search to the browser, making in-browser RAG plausible

VecLite is a Rust/WASM+SIMD library that accelerates vector search inside the browser. How far can you get with Transformers.js for embeddings, IndexedDB for storage, and no server at all?

Rust WebAssembly RAG Embedding AI Coding

TechApr 27, 20268 min

YourMemory Uses Biological Decay to Discard Stale AI Context

A look at sachitrafa/YourMemory, a local MCP memory server combining Ebbinghaus forgetting curves, BM25, vector search, and graph expansion. LoCoMo-10 Recall@5 currently sits at 59%.

AI AI Agents MCP RAG Claude Code Token Management

TechApr 24, 20269 min

Japan's Digital Agency open-sources its government AI "Gennai" with RAG, self-hosted LLM, and legal-AI templates under commercial-friendly licenses

Japan's Digital Agency released parts of Gennai, the generative AI platform it runs for central-government staff, on GitHub under MIT / CC BY 4.0. The web app and cloud-specific AI templates for AWS, Azure, and Google Cloud are bundled together so local governments and private companies can redeploy the same stack.

AI LLM RAG Open Source National strategy AWS Azure Google Cloud

TechApr 23, 202621 min

Running open-notebook on M1 Max Without Docker or Cloud APIs, and Letting qwen3.6:35b Read Its Own Article

The NotebookLM clone open-notebook assumes Docker and cloud APIs by default. I installed SurrealDB natively, ran four processes in tmux, and wired everything through Ollama's qwen3.6:35b and bge-m3. I fed it the Qwen3.6 benchmark article I wrote this morning, and it answered with the correct numbers.

AI LLM ローカルLLM Ollama Qwen Apple Silicon RAG OSS 実験