| query | indexer pipeline docs |
|---|---|
| target | arra-oracle-v3 + indexer-pro |
| mode | documentation |
| timestamp | 2026-05-03 00:30 |
.md files (vault/ψ/memory/)
↓ scan
SQLite + FTS5 (20,672 docs)
↓ embed
LanceDB vectors (per model)
↓ query
arra-cli search / indexer-pro compare
Three layers: filesystem → SQLite/FTS5 → LanceDB vectors
The indexer scans .md files and stores them in SQLite with FTS5 full-text search.
# Ensure server is running
curl -s http://localhost:47778/api/health
# Run the filesystem indexer (scans ψ/memory/ → SQLite)
cd /path/to/arra-oracle-v3
bun src/indexer/cli.ts- Collects — scans
ψ/memory/for.mdfiles (learnings, retros, principles) - Parses — splits by
##headers into granular documents - Deduplicates — content hash to avoid duplicates
- Stores — inserts into
oracle_documentstable +oracle_fts(FTS5 virtual table)
bun ~/.bun/bin/arra-cli stats
# → total: 20,672 docs
bun ~/.bun/bin/arra-cli list --limit 5
# → shows recent docs with types| Type | Count | Source |
|---|---|---|
| learning | 8,453 | ψ/memory/learnings/ |
| retro | 9,964 | ψ/memory/retrospectives/ |
| principle | 2,255 | ψ/memory/resonance/ |
The vector indexer reads docs from SQLite and embeds them into LanceDB collections using Ollama.
# Check what's installed
indexer-pro models
# OR
curl -s http://localhost:11434/api/tags | jq '.models[].name'| Model | Dims | Speed | Collection |
|---|---|---|---|
| nomic-embed-text | 768 | ~100 doc/s | oracle_knowledge |
| bge-m3 | 1024 | ~50 doc/s | oracle_knowledge_bge_m3 |
| qwen3-embedding | 4096 | ~30 doc/s | oracle_knowledge_qwen3 |
# Index ALL docs with nomic (fastest)
cd /path/to/arra-oracle-v3
bun src/scripts/index-model.ts nomic
# Index ALL docs with bge-m3 (better quality)
bun src/scripts/index-model.ts bge-m3The script:
- Reads ALL docs from SQLite (FTS5 join)
- Batches them (nomic: 100/batch, bge-m3: 50/batch)
- Embeds via Ollama
- Stores in LanceDB collection
There is NO limit in the code — it indexes every doc in SQLite. The 1,000 we saw on studio.buildwithoracle.com was from a test run that only indexed 1,000 learnings.
bun ~/.bun/bin/arra-cli stats
# → vectors: [{ key: "nomic", count: 1000 }, { key: "bge-m3", count: 1 }]
# Or via indexer-pro
indexer-pro collections
indexer-pro statusbun ~/.bun/bin/arra-cli search "oracle principles" --limit 5
# → uses SQLite FTS5, fast, keyword-based# Via API
curl "http://localhost:47778/api/search?q=oracle+principles&mode=vector&model=nomic&limit=5"
# Via indexer-pro
indexer-pro search "oracle principles" --model nomic --limit 5
indexer-pro compare-all "oracle principles" --limit 5curl "http://localhost:47778/api/search?q=oracle+principles&mode=hybrid&model=nomic&limit=5"bun ~/.bun/bin/arra-cli stats
indexer-pro status
indexer-pro doctorollama list | grep embed
# If model missing:
ollama pull nomic-embed-text
ollama pull bge-m3cd /path/to/arra-oracle-v3
# Full index with nomic (all 20,672 docs, ~3-4 min at 100 doc/s)
bun src/scripts/index-model.ts nomic
# Full index with bge-m3 (all 20,672 docs, ~7 min at 50 doc/s)
bun src/scripts/index-model.ts bge-m3bun ~/.bun/bin/arra-cli stats
# → nomic count should be 20,672
# → bge-m3 count should be 20,672
indexer-pro compare-all "test query" --limit 3
# → both models should return resultsOpen https://studio.buildwithoracle.com/map — should show all 20,672 docs, not 1,000.
indexer-pro is a standalone interactive CLI tool for managing the indexer pipeline.
Repo: https://github.com/Soul-Brews-Studio/indexer-pro
indexer-pro # Interactive wizard
indexer-pro status # DB stats, vector counts, Ollama health
indexer-pro models # Embedding models + install status
indexer-pro scan <path> # Scan .md files
indexer-pro search <query> # Quick vector search
indexer-pro compare-all <q> # Compare ALL models side by side
indexer-pro doctor # Diagnose issues
indexer-pro collections # List LanceDB collections
indexer-pro top # Live dashboard (like htop)Yesterday we ran scripts/index-learnings.ts which filtered to learnings only AND we stopped at 1,000 as a test batch. The actual index-model.ts script has NO limit — it indexes everything in SQLite.
Fix: just run bun src/scripts/index-model.ts nomic to index all 20,672 docs.