Firn 0.7.1 — SciFact `nprobes` sweep

Follow-up to the baseline reproduction at https://gist.github.com/gordonmurray/97c10e02081fd2acf50283d5c53347ec.

Same dataset (BEIR SciFact), same model (lightonai/LateOn), same namespace, same IVF_PQ index. The only thing being varied is the query-time nprobes value.

Files

firn-beir-scifact-nprobes-sweep.md — the writeup. Results table for nprobes ∈ {100, 20, 8, 1}, plan analysis explaining why nprobes has no effect at this scale, and what this means for the tuning page.

SciFact on Firn 0.7.1 — `nprobes` sweep

Follow-on to the baseline reproduction at https://gist.github.com/gordonmurray/97c10e02081fd2acf50283d5c53347ec. Question being answered: does dropping nprobes from the configured 100 hurt recall, and how does latency move with it?

Headline: nprobes has no measurable effect on this workload. Quality is identical at every value. Latency is identical at every value. The IVF partition probe is not the bottleneck here — the multivector MaxSim scoring stage is.

Setup

Same encoded SciFact embeddings as the baseline run.
Same namespace, same IVF_PQ index (num_partitions default sqrt(rows) ≈ 72, num_sub_vectors=64).
Firnflow restarted once before the sweep started, so the first measured value (nprobes=100) hits an empty foyer + empty handle pool. Subsequent values share that warm Lance state, which is noted below but the latency numbers show this doesn't materially change anything.
Search-only re-runs (no re-encode, no re-upsert, no re-index).
300 SciFact test queries, QUERY_CONCURRENCY=32, k=100.

Results

nprobes	ndcg@10	ndcg@100	recall@10	recall@100	QPS	p50	p95	p99
100	0.7575	0.7742	0.9036	0.9767	3.1	9 887 ms	13 772 ms	15 640 ms
20	0.7575	0.7742	0.9036	0.9767	3.1	9 928 ms	13 099 ms	14 845 ms
8	0.7575	0.7742	0.9036	0.9767	3.1	9 772 ms	15 025 ms	16 665 ms
1	0.7575	0.7742	0.9036	0.9767	3.1	10 096 ms	13 255 ms	14 618 ms

map, recall@10 and recall@100 are bit-for-bit identical across all four values. The latency variation across rows is well inside run-to-run noise on the same host.

Why nprobes does nothing here

Firnflow logs the underlying Lance execution plan for each query. Sample for one query (whitespace inserted for readability):

Projection(Take(CoalesceBatches(SortExec(TopK)(
  MultivectorScoring(
    SortExec(TopK)(ANNSubIndex(ANNIVFPartition)),
    ... × 12-22 sub-vectors ...
  )
))))
output_rows=100
iops=0  requests=0  bytes_read=0  indices_loaded=0  parts_loaded=0
index_comparisons=19_065_888    (range across queries: 19M-22M)

What this says:

The IVF index is being used. ANNIVFPartition nodes appear once per query sub-vector, fan into a single MultivectorScoring node, then top-K and projection on top.
No IO is happening. iops=0, requests=0, bytes_read=0, indices_loaded=0. The IVF index and the document fragments are already resident in memory from the warmup; no S3, no page-cache miss.
Cost is in scoring, not in probing. index_comparisons is ~19-22 million per query. At 300 queries × ~20M comparisons × 32 concurrent threads, you saturate a 16-core CPU on the MaxSim stage regardless of how aggressively you prune at the partition level.

So nprobes is asking "look at how many IVF partitions per query sub-vector before scoring", and the answer is "look at all of them if you want, the scoring step is going to dominate either way".

What this means for the tuning page

For SciFact-scale multivector workloads on Firn 0.7.1:

nprobes is not a latency knob. Setting it high (the reference config's 100) doesn't cost anything. Setting it low (PLAID's 8, or even 1) doesn't help. Leave it at whatever default feels sensible; the script default of 20 is fine.
nprobes is not a recall knob either, at this scale. With ~72 IVF partitions and 12-22 query sub-vectors, even nprobes=1 per sub-vector reaches enough partitions in aggregate that all top-100 docs are recovered.
The real latency knob is the scoring stage. That's controlled by num_sub_vectors (PQ codebook coarseness — fewer sub-vectors = cheaper but lower-fidelity MaxSim) and by corpus size. A follow-up sweep on num_sub_vectors is the natural next step.

This is a SciFact-only finding. On bigger corpora (NFCorpus, FIQA, Quora) the IVF probe stage is probably where the latency story shifts. Worth re-running there to find where nprobes actually starts to matter.

Host

12th Gen Intel i9-12900KF, 16 cores / 24 threads
30 GB RAM
MinIO + firnflow + bench all colocated, loopback networking
Firn 0.7.1 (ghcr.io/gordonmurray/firnflow:0.7.1)

gordonmurray/0-README.md

Select an option

No results found

Select an option

No results found

Firn 0.7.1 — SciFact `nprobes` sweep

Files

Links

SciFact on Firn 0.7.1 — `nprobes` sweep

Setup

Results

Why nprobes does nothing here

What this means for the tuning page

Host

gordonmurray/0-README.md

Firn 0.7.1 — SciFact nprobes sweep

Files

Links

SciFact on Firn 0.7.1 — nprobes sweep

Setup

Results

Why nprobes does nothing here

What this means for the tuning page

Host

Firn 0.7.1 — SciFact `nprobes` sweep

SciFact on Firn 0.7.1 — `nprobes` sweep