Skip to content

Instantly share code, notes, and snippets.

@anrieff
Last active April 29, 2018 20:05
Show Gist options
  • Save anrieff/ba35e24cac84c365cbd334b2a48ceaed to your computer and use it in GitHub Desktop.
Save anrieff/ba35e24cac84c365cbd334b2a48ceaed to your computer and use it in GitHub Desktop.
Extended birthday problem: probability of M people sharing a birthday out of a group of N.
#!/usr/bin/python
""" Compute the probability of M people sharing a birthday out of a group of
N people.
The parameter `exact' determines whether we want to ignore clusterings of
more than M people with the same birthday (thus the largest group of
coincident birthdays is exactly M). If it's false, we allow "M or more".
How it works: we define a recursive counting function
f(days, people, fulfilled) which tells us how many configurations of the
desired clustering of birthdays can be generated if we have
days (0..365) : number of days left of the year to play with,
people (0..N) : number of persons we haven't assigned a birthday yet,
fulfilled (0/1): tells us whether the predicate (M coincident birthdays)
has been already fulfilled.
If we're given with a particular function invocation with days > 0 and
people > 0, we can assign any number of people to have a birthday on the
next following day of the year. After that we'd be left with one day less,
and potentially less people. I.e., if we were to have p persons
(0 <= p <= people) sharing a birthday on that day, then we have
C(people, p) * f(days - 1, people - p, fulfilled | (p are enough))
successful combinations in that scenario.
C(n, k) is the binomial coefficient (n over k).
(p are enough) is a predicate that depends on whether we want to be exact.
If exact is true, (p is enough) ::= (p == M) and we need to ensure p <= M.
if it's false, (p is enough) ::= (p >= M).
The resultant probability is computed as f(365, N, 0) / 365**N.
Please note that the function is recursive and needs to be memoized to run
efficiently. Also the number of combinations may be huge (e.g., the number
of 4+ coincident birthdays over a group of 60 people is has 152 digits).
Back-of-the-envelope time complexity of the algorithm is O(N^3 logN) for
the "M+" case. The "M" case is slightly better at O(N^2 * M * logN).
"""
N = 60 # num people
M = 4 # num coincident birthdays
exact = False # False: compute for "M+" coincident birthdays.
# True: "M" (and no more) coincident birthdays
# Compute the Pascal's triangle.
C = []
for i in xrange(N + 1):
row = [1]
for j in xrange(1, i):
row.append(C[i - 1][j - 1] + C[i - 1][j])
row.append(1)
C.append(row)
# Memoization cache
cache = {}
# Counting function, as described.
def f(days, people, fulfilled):
if days == 0 and people > 0:
return 0
if people == 0:
return int(fulfilled)
# memoization:
global cache
key = (days, people, fulfilled)
if key in cache:
return cache[key]
result = 0
for p in xrange(people + 1):
this_does_it = (p == M) if exact else (p >= M)
if exact and p > M: break
result += C[people][p] * f(days - 1, people - p, fulfilled or this_does_it)
# memoization:
cache[key] = result
return result
numerator = f(365, N, False)
denominator = 365**N
print "Exact probability is %d/%d" % (numerator, denominator)
print "Numerically: %.9f%%" % (numerator * 10**22 / denominator / 10.**20)
@edielima
Copy link

edielima commented Apr 2, 2018

I would like to see the odds of 2 people (M = 2) in a growing class (N = 1,2,3,4 ....).
for N in range (1, 360):
However from 80% it starts to drop the probability, something wrong ... Then follow my script ... Would you have any tips?

for N in range(1, 360):
#N = 60 # num people
M = 2 # num coincident birthdays
exact = True # False: compute for "M+" coincident birthdays.
# True: "M" (and no more) coincident birthdays

Compute the Pascal's triangle.

C = []
for i in range(N + 1):
row = [1]
for j in range(1, i):
row.append(C[i - 1][j - 1] + C[i - 1][j])
row.append(1)
C.append(row)

Memoization cache

cache = {}

Counting function, as described.

def f(days, people, fulfilled):
if days == 0 and people > 0:
return 0
if people == 0:
return int(fulfilled)

# memoization:
global cache
key = (days, people, fulfilled)
if key in cache:
  return cache[key]

result = 0
for p in range(people + 1):
  this_does_it = (p == M) if exact else (p >= M)
  if exact and p > M: break
  result += C[people][p] * f(days - 1, people - p, fulfilled or this_does_it)

# memoization:
cache[key] = result
return result

numerator = f(365, N, False)
denominator = 365**N

#print "Exact probability is %d/%d" % (numerator, denominator)
print (N, "Prob: %.9f%%" % (numerator * 10**22 / denominator / 10.**20))
del f,cache,p,numerador,denominador,C,i,j,row,key,result,people,days,fulfilled,M,exact,this_does_it

@npodonnell
Copy link

There's a much easier way.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment