Last active
April 29, 2018 20:05
-
-
Save anrieff/ba35e24cac84c365cbd334b2a48ceaed to your computer and use it in GitHub Desktop.
Extended birthday problem: probability of M people sharing a birthday out of a group of N.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python | |
""" Compute the probability of M people sharing a birthday out of a group of | |
N people. | |
The parameter `exact' determines whether we want to ignore clusterings of | |
more than M people with the same birthday (thus the largest group of | |
coincident birthdays is exactly M). If it's false, we allow "M or more". | |
How it works: we define a recursive counting function | |
f(days, people, fulfilled) which tells us how many configurations of the | |
desired clustering of birthdays can be generated if we have | |
days (0..365) : number of days left of the year to play with, | |
people (0..N) : number of persons we haven't assigned a birthday yet, | |
fulfilled (0/1): tells us whether the predicate (M coincident birthdays) | |
has been already fulfilled. | |
If we're given with a particular function invocation with days > 0 and | |
people > 0, we can assign any number of people to have a birthday on the | |
next following day of the year. After that we'd be left with one day less, | |
and potentially less people. I.e., if we were to have p persons | |
(0 <= p <= people) sharing a birthday on that day, then we have | |
C(people, p) * f(days - 1, people - p, fulfilled | (p are enough)) | |
successful combinations in that scenario. | |
C(n, k) is the binomial coefficient (n over k). | |
(p are enough) is a predicate that depends on whether we want to be exact. | |
If exact is true, (p is enough) ::= (p == M) and we need to ensure p <= M. | |
if it's false, (p is enough) ::= (p >= M). | |
The resultant probability is computed as f(365, N, 0) / 365**N. | |
Please note that the function is recursive and needs to be memoized to run | |
efficiently. Also the number of combinations may be huge (e.g., the number | |
of 4+ coincident birthdays over a group of 60 people is has 152 digits). | |
Back-of-the-envelope time complexity of the algorithm is O(N^3 logN) for | |
the "M+" case. The "M" case is slightly better at O(N^2 * M * logN). | |
""" | |
N = 60 # num people | |
M = 4 # num coincident birthdays | |
exact = False # False: compute for "M+" coincident birthdays. | |
# True: "M" (and no more) coincident birthdays | |
# Compute the Pascal's triangle. | |
C = [] | |
for i in xrange(N + 1): | |
row = [1] | |
for j in xrange(1, i): | |
row.append(C[i - 1][j - 1] + C[i - 1][j]) | |
row.append(1) | |
C.append(row) | |
# Memoization cache | |
cache = {} | |
# Counting function, as described. | |
def f(days, people, fulfilled): | |
if days == 0 and people > 0: | |
return 0 | |
if people == 0: | |
return int(fulfilled) | |
# memoization: | |
global cache | |
key = (days, people, fulfilled) | |
if key in cache: | |
return cache[key] | |
result = 0 | |
for p in xrange(people + 1): | |
this_does_it = (p == M) if exact else (p >= M) | |
if exact and p > M: break | |
result += C[people][p] * f(days - 1, people - p, fulfilled or this_does_it) | |
# memoization: | |
cache[key] = result | |
return result | |
numerator = f(365, N, False) | |
denominator = 365**N | |
print "Exact probability is %d/%d" % (numerator, denominator) | |
print "Numerically: %.9f%%" % (numerator * 10**22 / denominator / 10.**20) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I would like to see the odds of 2 people (M = 2) in a growing class (N = 1,2,3,4 ....).
for N in range (1, 360):
However from 80% it starts to drop the probability, something wrong ... Then follow my script ... Would you have any tips?
for N in range(1, 360):
#N = 60 # num people
M = 2 # num coincident birthdays
exact = True # False: compute for "M+" coincident birthdays.
# True: "M" (and no more) coincident birthdays
Compute the Pascal's triangle.
C = []
for i in range(N + 1):
row = [1]
for j in range(1, i):
row.append(C[i - 1][j - 1] + C[i - 1][j])
row.append(1)
C.append(row)
Memoization cache
cache = {}
Counting function, as described.
def f(days, people, fulfilled):
if days == 0 and people > 0:
return 0
if people == 0:
return int(fulfilled)
numerator = f(365, N, False)
denominator = 365**N
#print "Exact probability is %d/%d" % (numerator, denominator)
print (N, "Prob: %.9f%%" % (numerator * 10**22 / denominator / 10.**20))
del f,cache,p,numerador,denominador,C,i,j,row,key,result,people,days,fulfilled,M,exact,this_does_it