Created
September 14, 2014 23:34
-
-
Save zackmdavis/087b03ca09253ea6233a to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<blockquote>So if you have P_1 that gives x 50% of the time and y the | |
rest, and P_2 that gives x 40% of the time and y 60% of the time, we | |
say that the difference is .2: given 100 samples, we will have 80 | |
matching samples and 20 unaccounted for (10 more x samples for P_1, | |
and 10 less y).</blockquote> | |
<p>What? Out of a hundred samples, P_1 yields x in 50 of them, and of | |
these, P_2 matches it with another x 50*0.4 = 20 times. In another 50 | |
samples, P_1 yields y, and P_2 matches it 50*0.6 = 30 times, for a | |
total of 20 + 30 matching samples. (And it would be the same for any | |
other P_2 over x and y; you can't expect to do better <em>or</em> worse | |
than chance at guessing a fair coinflip.)</p> | |
<p>But anyway, I don't think the perhaps-more-interesting problem is | |
hard, modulo the philosophical question of how to formalize the | |
natural-language concept of <em>similarity</em> for probability distributions | |
(the Overmind suggests <a | |
href="http://en.wikipedia.org/wiki/Bhattacharyya_distance">Bhattacharyya distance</a>?). To | |
compute what you should believe about the parents' genotypes (and | |
therefore the distribution of possible children) given observations of | |
successive children, just start with your prior probability | |
distribution over parent genotypes, and apply Bayes's theorem. Say that each parent has a single "allele" (scare quotes because this isn't how real-world biology works) <em>a</em> or <em>b</em>, and the child randomly gets an allele from one parent. Then we might think about the problem like this:</p> | |
<code><pre>#!/usr/bin/python3 | |
from collections import namedtuple, defaultdict | |
from itertools import product | |
AllelePairing = namedtuple('AllelePairing', ('F', 'M')) | |
possible_worlds = list(AllelePairing(*alleles) for alleles | |
in product(('a', 'b'), repeat=2)) | |
prior_beliefs = {world: 1/len(possible_worlds) for world in possible_worlds} | |
def prediction(belief): | |
outcomes = defaultdict(lambda: 0) | |
for allele in belief: | |
outcomes[allele] += 0.5 | |
return outcomes | |
def update(beliefs, observation): | |
# the fundamental theorem of epistemology: | |
# P(hypothesis | evidence) = P(evidence | hypothesis) * P(hypothesis) | |
# / P(evidence) | |
new_beliefs = {} | |
prior_on_evidence = sum((prediction(belief)[observation] * probability) | |
for belief, probability in beliefs.items()) | |
for belief, probability in beliefs.items(): | |
new_beliefs[belief] = (prediction(belief)[observation] * | |
probability / prior_on_evidence) | |
return new_beliefs | |
def belief_sequence(beliefs, observations): | |
for observation in observations: | |
beliefs = update(beliefs, observation) | |
yield beliefs</pre></code> | |
<p>And then, for example, if we keep observing children carrying the <em>a</em> alleles, we can compute exactly how increasingly confident we should be that both parents have the <em>a</em> allele:</p> | |
<code><pre>>>> for belief_at_time in belief_sequence(prior_beliefs, ('a',)*6): | |
... print(belief_at_time[AllelePairing('a', 'a')]) | |
... | |
0.5 | |
0.6666666666666666 | |
0.8 | |
0.8888888888888888 | |
0.9411764705882353 | |
0.9696969696969698</pre></code> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment