Adopt FAISS as a GPU-only dense retrieval backend for benchmark runs, while preserving current Qdrant behavior as fallback and avoiding faiss-cpu installation in shared environments.
- In scope:
- GPU-only FAISS dependency path.
- Backend selection via config/env.
- Dense retrieval migration (coarse + rerank path).
- Benchmark parity checks (latency + retrieval quality).
- Out of scope:
- Removing Qdrant immediately.
- ANN index tuning (IVF/HNSW/PQ) in phase 1.
- Dense indexing and retrieval are Qdrant-based in storage/indexer.py and retrieval/retriever.py.
- BM25 stays in storage/indexer.py and retrieval/retriever.py.
- Cluster routing currently uses MiniBatchKMeans in storage/indexer.py.
- GPU image path is Dockerfile.gpu.
- Bare-metal GPU setup path is Makefile.
- Do not add FAISS to requirements.txt.
- Install FAISS only in GPU-specific setup paths:
- Add
VECTOR_BACKENDswitch:qdrant(default)faiss(GPU benchmark path)
- Keep BM25 unchanged for hybrid retrieval.
- Start with exact FAISS indexes:
IndexFlatIP(128)for coarse stage.IndexFlatIP(384)for full-stage rerank/scoring.
- Reassess cluster routing after FAISS benchmark data; keep initially for parity.
- Add backend config in config.py:
VECTOR_BACKEND = os.getenv("VECTOR_BACKEND", "qdrant")
- Add FAISS install in Dockerfile.gpu only.
- Add FAISS install in Makefile
setuptarget only. - Add GPU benchmark targets in Makefile:
bench-faisse2e-faiss- export
VECTOR_BACKEND=faissfor these targets.
- Introduce FAISS index build functions in storage/indexer.py:
- build coarse and full indexes from normalized vectors.
- Persist artifacts under
data/:coarse.faissfull.faiss(optional if rerank is numpy-only over candidate vectors)- metadata map (row_id -> chunk metadata).
- Keep existing BM25 and chunk cache unchanged.
- In retrieval/retriever.py, route
_coarse_searchby backend:- Qdrant path (existing)
- FAISS path (new)
- In retrieval/retriever.py, route
_rerankby backend:- Qdrant path (existing)
- FAISS/numpy candidate scoring path (new)
- Keep query embedding and BM25 flow unchanged in retrieval/retriever.py.
- Compare
qdrantvsfaisson same corpus/questions:- ingestion index build time
- query latency for all 15 questions
- top-k overlap and final answer quality metrics
- Acceptance criteria:
- FAISS backend installs only in GPU paths.
- No
faiss-cpuanywhere. - FAISS benchmark latency is equal or better than Qdrant.
- No quality regression beyond agreed threshold.
- If FAISS wins, make
faissdefault only in GPU benchmark workflows.
- FAISS wheel/CUDA compatibility:
- Pin versions and validate in both Docker and bare-metal setup.
- Divergence between Docker and Makefile setups:
- Use same install source/version in both files.
- Metadata mapping bugs:
- Add integrity checks for row_id/chunk_id consistency.
- Cluster routing complexity:
- Keep behind flag, disable if no measurable gain.
- M1 (0.5 day): dependency plumbing + backend flag.
- M2 (1 day): FAISS indexing + persistence.
- M3 (1 day): retrieval integration + parity tests.
- M4 (0.5 day): benchmark report + decision.
- GPU-only FAISS path operational in Dockerfile.gpu and Makefile.
VECTOR_BACKEND=faissruns ingestion and query successfully.- Benchmarks documented with Qdrant comparison.
- Default non-GPU/local paths remain unaffected.