Tech 7 min read

Semble code search on a 1595-md blog: hybrid loses symbols, bm25 wins

IkesanContents

I tested MinishLab/semble on this blog repo (an Astro site with 1595 markdown articles and 152 source files) on an M1 Mac.
Warm bm25 queries returned the definitions of getPublishedArticles and featuredArticles at the top in 0.84 seconds.
Hybrid mode, on the other hand, lost short symbols like seasonalBanner and surfaced CLIP articles and a C# Union types post instead.
On a repo with a heavy article corpus, code search and document search behave very differently.

How Semble returns chunks instead of grep results

Semble chunks code along syntactic boundaries and searches with BM25 plus a static embedding.
The embedding is minishlab/potion-code-16M.
All processing happens on CPU — no large Transformer model runs per query. Results from the static embedding and BM25 are merged with RRF, and then a code-aware reranker is applied.

The reranker weights natural-language queries and symbol-like queries differently.
For queries like getUserById or _private it leans on exact string matches; for queries like How is authentication handled? it balances semantic search.
There are also adjustments for ranking definitions above references, lifting files with multiple hits as a unit, and downranking test, legacy, and .d.ts files.

This is different from the search agent in the Chroma Context-1 article.
Context-1 has the model itself loop through searching, reading, and pruning.
Semble doesn’t run that kind of agent loop — it just shrinks the initial result set so less code reaches the model.

The README benchmark is brutal to grep

The README publishes a benchmark over 19 languages, 63 repositories, and about 1,250 queries.
Queries fall into three categories: semantic, architecture, and symbol.
Overall NDCG@10 is 0.854 for Semble and 0.862 for CodeRankEmbed Hybrid.
The README claims 99% of the quality of the 137M-parameter CodeRankEmbed Hybrid, while being 218× faster to index and 11× faster to query.

The token-efficiency table is where Semble’s design intent shows up most clearly.
With “tokens consumed until the first relevant hit, or 32k if not found” averaged across queries, ripgrep + read file uses 45,692 tokens and Semble uses 566.
That’s the basis of the 98% reduction claim.

MethodExpected tokens/queryNotes
ripgrep + read file45,692baseline
Semble56698% fewer

The catch is that this comparison models an agent behavior of “grep, then read the whole matching file.”
It’s not the same as a human running rg -n, sed -n, and rg -C 3 to read only the necessary slices.
What Semble actually replaces isn’t grep itself — it’s the habit of an agent jumping straight from grep results to reading the full file.

Ground truth for the benchmark is generated and verified by Claude Sonnet 4.6, so these are LLM-as-judge labels, which is worth keeping in mind.
codanna is excluded because it doesn’t support 6 of the 19 languages, and claude-context is excluded because it requires a paid OpenAI API key and a vector DB.
Once you constrain yourself to “local CPU only, no API keys,” the field of candidates shrinks fast. Semble is one of the few that fits.

Hands-on test on a 1595-md Astro blog

I ran it with uvx --from "semble[mcp]" semble.
The target was this raw repo: 1595 markdown articles, 109 .astro files, 43 .ts, 89 .js.
The articles directory alone is 9MB, so for a code search tool it’s an unusual setup where the document corpus dwarfs the code.

The first query took 33.73s real (15.42s user) including index construction.
From the second query on, warm timings looked like this.

QueryModeExpected hitResultWarm time
seasonalBannerhybridsrc/lib/seasonal-banner.tsMissed (CLIP article / C# Union article)4.99s
seasonalBannerbm25sameTop hit (score 13.8)0.87s
getPublishedArticleshybridsrc/lib/articles.tsTop hit
getPublishedArticlesbm25same2nd (1st was a caller, rss.xml.ts)0.84s
featuredArticlesbm25src/lib/featured.tsTop hit (score 11.8)0.84s
”kindle book hero image rule”hybridrule inside CLAUDE.mdKindle articles show up, CLAUDE.md doesn’t4.80s
”タグ スラッシュ ビルドエラー”hybridthe matching section in CLAUDE.mdMissed entirely4.94s
”specialbanner”bm25CLAUDE.md + seasonal-banner.tsBoth hit

The most surprising symbol-search behavior showed up here.
seasonalBanner is defined in src/lib/seasonal-banner.ts:10 as export const seasonalBanner = {, and it’s referenced from src/pages/index.astro.
Hybrid mode still pushed CLIP/bge-m3 articles and a C#15 Union types explainer above it.
It looks like the semantic side picked up large amounts of prose that’s near the words “seasonal” and “banner” and pushed the code file down.

Switching to bm25 mode returned src/lib/seasonal-banner.ts:1-16 at the top with a score of 13.8 in 0.87 seconds.
On the other hand, identifiers with high uniqueness like getPublishedArticles placed the definition at the top even in hybrid mode.
Semble’s “promote definitions over references” reranker was working here.

Pulling CLAUDE.md out with a natural-language query was harder than I expected.
”タグ スラッシュ ビルドエラー” (tag / slash / build error) is literally written in CLAUDE.md as “スラッシュ(/)禁止。ビルドエラーになる” (no slashes — they cause build errors), and yet the top results were Gradio and Astro 6 build-error articles.
CLAUDE.md itself is indexed (it shows up cleanly for specialbanner), so this is a ranking problem, not a retrieval problem.
Against the density of 1595 markdown articles, a single file’s scattered guardrails get buried easily.

The semble savings log showed about 128.8k tokens saved over 11 queries, or 92%.
Note that this is Semble’s own estimate — subtracting “how many tokens would reading the full file for the same hit have cost” — so it tends to be more optimistic than savings under real agent operation.

MCP alone doesn’t reach sub-agents

Semble can run as an MCP server.
For Claude Code, register it with claude mcp add; for Codex, add uvx --from "semble[mcp]" semble to ~/.codex/config.toml.
The MCP exposes two tools, search and find_related, and accepts both local paths and git URLs.

The README spends a lot of space on bash integration.
In Claude Code and Codex CLI sub-agents, the MCP schema is lazy-loaded on the top-level side, so sub-agents can’t call MCP tools directly.
To work around this, there’s a documented pattern of writing the semble search and semble find-related usage into AGENTS.md or CLAUDE.md and invoking them as bash tools.

Laying this next to the article about CTX adding live memory to Claude Code makes the role difference clearer.
CTX is on the UserPromptSubmit hook side — it injects git log, code, Markdown, and past chats right before the prompt.
Semble is a search command the agent calls during exploration, narrowing how many files it goes back to read.

Where Semble sits

The token-reduction tooling has multiple layers.
Compresr Context Gateway sits in front of the LLM API as a proxy and compresses conversation history and tool output.
YourMemory and OCR-Memory handle how long-term history is pulled back in.
Semble sits earlier than any of those, narrowing the entry point right before the agent runs grep and reads files.

On my own repo, I’ll start by defaulting to -m bm25 and routing all symbol searches through it, and only call hybrid explicitly when I’m looking for an article or a spec in natural language.
As long as 1595 markdown files are sitting in the index, making hybrid the default for symbol search just breaks it.

References