Embedding any private repo for semantic + BM25 code search with claude-context

A laptop-local hybrid search stack that indexes any private repo and lets you query by intent, not just keywords. No SaaS, just Vertex AI for embeddings. You get semantic similarity + BM25 ranking + exact location results (file:line-line) without sending your code anywhere except Google Cloud's embedding API.

What this is

claude-context is a Claude Code MCP skill that combines:

Milvus (vector database) for storage and hybrid search
LiteLLM (proxy) to route embedding requests
Vertex AI gemini-embedding-001 (3072-dim) for semantic vectors
@zilliz/claude-context-mcp (MCP server) to wire it all into Claude Code or any MCP-aware agent

Ask "where is webhook signature verification?" and get back the actual code chunks with line numbers, even if the word "webhook" never appears in the source.

The stack (architecture)

agent/CLI ── MCP/stdio ──► @zilliz/claude-context-mcp
                                  │
                                  ├─► OPENAI_BASE_URL=http://127.0.0.1:4001/v1
                                  │       │
                                  │       ▼
                                  │   LiteLLM proxy ──► Vertex AI gemini-embedding-001
                                  │                     (<your-gcp-project>, gcloud ADC)
                                  │                     3072-dim vectors
                                  ▼
                             Milvus standalone (Docker)
                             gRPC: 127.0.0.1:19530
                             REST: 127.0.0.1:9091
                             one collection per repo

Vertex AI over AI Studio (no API key required, uses your gcloud application-default credentials). Milvus standalone not Lite (the MCP server needs gRPC).

Prerequisites

Docker (for Milvus + etcd + minio)
Node 20+ (for the MCP server)
Python 3.11+ (for LiteLLM)

gcloud authenticated with ADC against a project that has Vertex AI enabled:

gcloud auth application-default login
gcloud config set project <your-gcp-project>

Install

Clone the portable bundle (public repo):

git clone https://github.com/Timtech4u/Mercurie.git ~/code-search-bundle
cd ~/code-search-bundle/tools/local-code-search

Install the skill and local-stack scripts:

# 1. Install skill into global Claude Code skills dir
mkdir -p ~/.claude/skills
cp -R skill ~/.claude/skills/claude-context

# 2. Copy local-stack (Milvus compose, LiteLLM config, wrappers)
mkdir -p ~/local-code-search
cp local-stack/* ~/local-code-search/

# 3. Symlink wrapper commands onto $PATH
mkdir -p ~/bin
ln -sf ~/.claude/skills/claude-context/bin/cc-* ~/bin/

First boot

Bring up the stack (Milvus + LiteLLM):

cc-start

Healthy output ends with == ready == and a 3072-dim embedding probe success. This is idempotent — run it any time something looks off.

The first run creates:

~/.context/.env (MCP environment: provider, base URL, Milvus address)
~/.context/.contextignore (global ignore: vendor, *.pb.go, lockfiles, node_modules)
Docker volumes for Milvus data (at ~/local-code-search/volumes/)

Index a repo

Point cc-index at any repo:

cc-index ~/code/your-repo

This walks the repo, chunks each file (respecting .contextignore), embeds with Vertex AI, and writes to Milvus. ~5 minutes for ~1500 chunks. Progress appears live.

Check status:

cc-status ~/code/your-repo

When it says ✅ ... fully indexed, you're ready to search.

Subsequent cc-index runs are incremental (only changed files re-embed) thanks to a merkle DAG at ~/.context/merkle/<hash>.json.

Force a full re-index:

cc-index ~/code/your-repo --force

Search

Query by intent:

cc-search ~/code/your-repo "where is webhook signature verified" --limit 3

Output is JSON with Location: <file>:<startLine>-<endLine>, language, rank, and the literal code chunk. For plain file:line results:

cc-search ~/code/your-repo "<query>" --limit 5 \
  | python3 -c "import json,sys,re; print('\n'.join(re.findall(r'Location: (\S+)', json.load(sys.stdin)['content'][0]['text'])))"

Examples that work:

"rate limiting middleware" → finds the middleware even if it's not named RateLimiter
"Spanner transaction retry logic" → surfaces the retry loop by concept
"error handling for customer import" → chunks from the error package + handler

Use it inside Claude Code or Codex

The MCP server is auto-registered when you install the skill. In a Claude Code session, just ask:

"Use claude-context to find where Stripe webhooks are verified."

The agent calls search_code over MCP, reads the returned chunks, and answers from real code with file:line citations.

First time on a new repo: ask the agent to index it first, or run cc-index <repo> yourself before the session.

Keep it fresh

The vector store gets stale as code lands. Re-index regularly:

cc-index ~/code/your-repo

Only changed files re-embed (incremental). A daily cron or launchd job is recommended.

Hard rules (from real use)

Vendor and lockfiles must be ignored. Without .contextignore, a Go repo with vendor/ indexes 10× larger and drowns real code in noise. The skill's cc-start writes sensible defaults if missing.
Embedding model is gemini-embedding-001, not -2. The -001 model is GA on Vertex AI; -2 may not be available in all projects. Both are 3072-dim.
Read-only triage when reviewing issues. If you use this for backlog review, output is a report the human acts on — never gh issue close or post comments automatically.
Conservative bias to KEEP. When search results are ambiguous, assume the issue is still valid. Closing real work by mistake costs more than keeping stale issues open.
Index before review. Searching an empty collection produces false negatives. Wait for ✅ ... fully indexed.

Troubleshooting

If something breaks:

cc-start   # diagnose + restart Milvus + LiteLLM

To see what's indexed:

curl -sS http://127.0.0.1:9091/api/v1/collections | jq

To drop a broken collection and re-index:

# Find the collection hash from the list above
curl -sS -X DELETE http://127.0.0.1:9091/api/v1/collection \
  -H 'Content-Type: application/json' \
  -d '{"collection_name":"hybrid_code_chunks_<hash>"}'

# Clear the merkle so the next index rebuilds from scratch
rm ~/.context/merkle/<hash>.json
cc-index ~/code/your-repo --force

Where things live

Path	Purpose
`~/.claude/skills/claude-context/bin/`	The `cc-*` wrapper scripts
`~/bin/cc-*`	Symlinks so they're on `$PATH`
`~/.context/.env`	MCP environment (provider, base URL, Milvus address)
`~/.context/.contextignore`	Global ignore (vendor, *.pb.go, lockfiles)
`~/.context/merkle/<hash>.json`	Per-repo content snapshot (incremental index)
`~/local-code-search/litellm.config.yaml`	LiteLLM → Vertex routing
`~/local-code-search/milvus-standalone-docker-compose.yml`	Milvus + etcd + minio
`~/local-code-search/volumes/`	Milvus data (DO NOT delete)

Real-world result

A production backlog triage (28 open issues, ~1500 code chunks indexed in 5 min) ran end-to-end in 7 min with 4 parallel agents. Every verdict cited file:line evidence. Zero false closes.

License

Public bundle: Apache-2.0. See the upstream repo for details.

Questions? File an issue at https://github.com/Timtech4u/Mercurie/issues

Timtech4u/gist-body.md

Select an option

No results found