Skip to content

Instantly share code, notes, and snippets.

@tremby
Created September 6, 2018 23:26
Show Gist options
  • Save tremby/3652b4fae84554b4c0f7a8c2ccf0d82f to your computer and use it in GitHub Desktop.
Save tremby/3652b4fae84554b4c0f7a8c2ccf0d82f to your computer and use it in GitHub Desktop.
Calculate the expected number of duplicates when choosing n items from a pool of m, p times
import random
from collections import Counter
from itertools import chain
POOL_SIZE = 100
PAGE_SIZE = 9
NUM_PAGES = 2
ITERATIONS = 10000
pool = set(range(POOL_SIZE))
dupe_counts = []
for i in xrange(ITERATIONS):
pages = [random.sample(pool, PAGE_SIZE) for p in xrange(NUM_PAGES)]
duplicates = [item for (item, count) in Counter(chain(*pages)).iteritems() if count > 1]
dupe_counts.append(len(duplicates))
print("""
Over {iterations} iterations
getting {num_pages} pages, each of size {page_size},
from a pool of {pool_size} entries,
there were an average of {average} duplicates.
""".format(
iterations=ITERATIONS,
num_pages=NUM_PAGES,
page_size=PAGE_SIZE,
pool_size=POOL_SIZE,
average=sum(dupe_counts) / float(len(dupe_counts))
))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment