Membership Filter data sketches

Membership filters (also called Approximate Membership Query or AMQ structures) are used to quickly check if an element is in a set. They are designed to never return a false negative—if the filter says "No," it is definitely not there. If it says "Yes," there's a small chance it might be a false positive.

Here is the table for Membership Filter data sketches:

Algorithm	Memory (vs Bloom)	CPU / Op	Accuracy (vs Bloom)	Delete	Merge	Notes
Bloom Filter	$1.44 \log_2(1/\epsilon)$ bits/item	Fast	Baseline	No	Yes	The industry standard; simple bit-array.
Counting Bloom	3x – 4x Larger	Medium	Same	Yes	Yes	Adds counters to allow for item removal.
Cuckoo Filter	~20% Smaller	Fast	Better at low $\epsilon$	Yes	No	Fingerprint-based; better space efficiency for low error rates.
XOR Filter	~23% Smaller	Very Fast	Same	No	No	Static only; requires knowing all keys at construction.
Binary Fuse	~35% Smaller	Extremely Fast	Same	No	No	Static only; modern successor to XOR with better cache locality.
Quotient Filter	Comparable	Medium	Same	Yes	Yes	High performance for SSD/Disk-resident filters.
Ribbon Filter	~40% Smaller	Fast	Better efficiency	No	Yes	Static only; highly space-optimized for LSM-tree indexes.
GCS Filter	~45% Smaller	Slow	Best efficiency	No	No	Static only; uses Golomb coding for maximum compression.

Key Comparisons

Dynamic vs. Static: If you need to add data over time, use Bloom or Cuckoo. If your data is fixed (e.g., an index for a static file), use Binary Fuse or Ribbon to save massive amounts of RAM.
The "Deletion" Factor: Only Cuckoo and Quotient filters natively support removing an item without rebuilding the entire structure.
Cache Efficiency: Quotient and Cuckoo filters are generally better for modern CPUs because they access contiguous memory slots rather than jumping to random bits like a standard Bloom Filter.

Would you like me to explain how "peeling" works in XOR and Binary Fuse filters?

Algorithm	Memory (vs Bloom)	CPU / Op	Accuracy (vs Bloom)	Delete	Merge	Notes
Bloom Filter	Baseline	Fast	Baseline	No	Yes	The industry standard; simple bit-array.
Counting Bloom	3x – 4x Larger	Medium	Same	Yes	Yes	Adds counters to allow for item removal.
Cuckoo Filter	~20% Smaller	Fast	Better at low $\epsilon$	Yes	No	Fingerprint-based; better space efficiency for low error rates.
XOR Filter	~23% Smaller	Very Fast	Same	No	No	Static only; requires knowing all keys at construction.
Binary Fuse	~35% Smaller	Extremely Fast	Same	No	No	Static only; modern successor to XOR with better cache locality.
Quotient Filter	Comparable	Medium	Same	Yes	Yes	High performance for SSD/Disk-resident filters.
Ribbon Filter	~40% Smaller	Fast	Better efficiency	No	Yes	Static only; highly space-optimized for LSM-tree indexes.
GCS Filter	~45% Smaller	Slow	Best efficiency	No	No	Static only; uses Golomb coding for maximum compression.

Algorithm

Memory (vs Bloom)

CPU / Op

Accuracy (vs Bloom)

Delete

Merge

Notes

Bloom Filter

Baseline

Fast

Baseline

Yes

The industry standard; simple bit-array.

Counting Bloom

3x – 4x Larger

Medium

Same

Yes

Adds counters to allow for item removal.

Cuckoo Filter

~20% Smaller

Fast

Better at low $\epsilon$

Yes

Fingerprint-based; better space efficiency for low error rates.

XOR Filter

~23% Smaller

Very Fast

Same

Static only; requires knowing all keys at construction.

Binary Fuse

~35% Smaller

Extremely Fast

Same

Static only; modern successor to XOR with better cache locality.

Quotient Filter

Comparable

Medium

Same

Yes

High performance for SSD/Disk-resident filters.

Ribbon Filter

~40% Smaller

Fast

Better efficiency

Yes

Static only; highly space-optimized for LSM-tree indexes.

GCS Filter

~45% Smaller

Slow

Best efficiency

Static only; uses Golomb coding for maximum compression.

thanos/membership_filters.md

Select an option

No results found

Select an option

No results found

Key Comparisons

thanos commented Mar 15, 2026

Uh oh!