Skip to content

Instantly share code, notes, and snippets.

@MilosSimic
Forked from brandt/bloom-filter-calculator.rb
Created January 30, 2019 23:44
Show Gist options
  • Save MilosSimic/d7144c709a9ef5565c84769ff6122113 to your computer and use it in GitHub Desktop.
Save MilosSimic/d7144c709a9ef5565c84769ff6122113 to your computer and use it in GitHub Desktop.
Calculate the required bloom filter size and optimal number of hashes from the expected number of items in the collection and acceptable false-positive rate
# Optimal bloom filter size and number of hashes
# Tips:
# 1. One byte per item in the input set gives about a 2% false positive rate.
# 2. The optimal number of hash functions is ~0.7x the number of bits per item.
# 3. The number of hashes dominates performance.
# Expected number of items in the collection
# n = (m * ln(2))/k;
n = 300_000
# Acceptable false-positive rate (0.01 = 1%)
# p = e^(-(m/n) * (ln(2)^2));
fpr = 0.01
# Optimal size (number of elements in the bit array)
# m = -((n*ln(p))/(ln(2)^2));
m = (n * Math.log(fpr).abs) / (Math.log(2) ** 2)
# Optimal number of hash functions
# k = (m/n) * ln(2);
k = (m / n) * Math.log(2)
puts "Optimal bloom filter size: #{m.ceil} bits"
puts "Optimal number of hash functions: #{k.ceil}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment