zackmdavis · September 14, 2014 23:34
diff --git a/comment.html b/comment.html
 <blockquote>So if you have P_1 that gives x 50% of the time and y the
 rest, and P_2 that gives x 40% of the time and y 60% of the time, we
 say that the difference is .2: given 100 samples, we will have 80
 matching samples and 20 unaccounted for (10 more x samples for P_1,
 and 10 less y).</blockquote>

 <p>What? Out of a hundred samples, P_1 yields x in 50 of them, and of
 these, P_2 matches it with another x 50*0.4 = 20 times. In another 50
 samples, P_1 yields y, and P_2 matches it 50*0.6 = 30 times, for a
 total of 20 + 30 matching samples. (And it would be the same for any
 other P_2 over x and y; you can't expect to do better <em>or</em> worse
 than chance at guessing a fair coinflip.)</p>

 <p>But anyway, I don't think the perhaps-more-interesting problem is
 hard, modulo the philosophical question of how to formalize the
 natural-language concept of <em>similarity</em> for probability distributions
 (the Overmind suggests <a
 href="http://en.wikipedia.org/wiki/Bhattacharyya_distance">Bhattacharyya distance</a>?). To
 compute what you should believe about the parents' genotypes (and
 therefore the distribution of possible children) given observations of
 successive children, just start with your prior probability
 distribution over parent genotypes, and apply Bayes's theorem. Say that each parent has a single "allele" (scare quotes because this isn't how real-world biology works) <em>a</em> or <em>b</em>, and the child randomly gets an allele from one parent. Then we might think about the problem like this:</p>

 <code><pre>#!/usr/bin/python3

 from collections import namedtuple, defaultdict
 from itertools import product

 AllelePairing = namedtuple('AllelePairing', ('F', 'M'))

 possible_worlds = list(AllelePairing(*alleles) for alleles
                       in product(('a', 'b'), repeat=2))
 prior_beliefs = {world: 1/len(possible_worlds) for world in possible_worlds}

 def prediction(belief):
    outcomes = defaultdict(lambda: 0)
    for allele in belief:
        outcomes[allele] += 0.5
    return outcomes

 def update(beliefs, observation):
    # the fundamental theorem of epistemology:
    # P(hypothesis | evidence) = P(evidence | hypothesis) * P(hypothesis)
    #                             / P(evidence)
    new_beliefs = {}
    prior_on_evidence = sum((prediction(belief)[observation] * probability)
                            for belief, probability in beliefs.items())
    for belief, probability in beliefs.items():
        new_beliefs[belief] = (prediction(belief)[observation] *
                               probability / prior_on_evidence)
    return new_beliefs

 def belief_sequence(beliefs, observations):
    for observation in observations:
        beliefs = update(beliefs, observation)
        yield beliefs</pre></code>

 <p>And then, for example, if we keep observing children carrying the <em>a</em> alleles, we can compute exactly how increasingly confident we should be that both parents have the <em>a</em> allele:</p>

 <code><pre>&gt;&gt;&gt; for belief_at_time in belief_sequence(prior_beliefs, ('a',)*6):
 ...     print(belief_at_time[AllelePairing('a', 'a')])
 ...
 0.5
 0.6666666666666666
 0.8
 0.8888888888888888
 0.9411764705882353
 0.9696969696969698</pre></code>
	<blockquote>So if you have P_1 that gives x 50% of the time and y the
	rest, and P_2 that gives x 40% of the time and y 60% of the time, we
	say that the difference is .2: given 100 samples, we will have 80
	matching samples and 20 unaccounted for (10 more x samples for P_1,
	and 10 less y).</blockquote>

	<p>What? Out of a hundred samples, P_1 yields x in 50 of them, and of
	these, P_2 matches it with another x 50*0.4 = 20 times. In another 50
	samples, P_1 yields y, and P_2 matches it 50*0.6 = 30 times, for a
	total of 20 + 30 matching samples. (And it would be the same for any
	other P_2 over x and y; you can't expect to do better <em>or</em> worse
	than chance at guessing a fair coinflip.)</p>

	<p>But anyway, I don't think the perhaps-more-interesting problem is
	hard, modulo the philosophical question of how to formalize the
	natural-language concept of <em>similarity</em> for probability distributions
	(the Overmind suggests <a
	href="http://en.wikipedia.org/wiki/Bhattacharyya_distance">Bhattacharyya distance</a>?). To
	compute what you should believe about the parents' genotypes (and
	therefore the distribution of possible children) given observations of
	successive children, just start with your prior probability
	distribution over parent genotypes, and apply Bayes's theorem. Say that each parent has a single "allele" (scare quotes because this isn't how real-world biology works) <em>a</em> or <em>b</em>, and the child randomly gets an allele from one parent. Then we might think about the problem like this:</p>

	<code><pre>#!/usr/bin/python3

	from collections import namedtuple, defaultdict
	from itertools import product

	AllelePairing = namedtuple('AllelePairing', ('F', 'M'))

	possible_worlds = list(AllelePairing(*alleles) for alleles
	in product(('a', 'b'), repeat=2))
	prior_beliefs = {world: 1/len(possible_worlds) for world in possible_worlds}

	def prediction(belief):
	outcomes = defaultdict(lambda: 0)
	for allele in belief:
	outcomes[allele] += 0.5
	return outcomes

	def update(beliefs, observation):
	# the fundamental theorem of epistemology:
	# P(hypothesis \| evidence) = P(evidence \| hypothesis) * P(hypothesis)
	# / P(evidence)
	new_beliefs = {}
	prior_on_evidence = sum((prediction(belief)[observation] * probability)
	for belief, probability in beliefs.items())
	for belief, probability in beliefs.items():
	new_beliefs[belief] = (prediction(belief)[observation] *
	probability / prior_on_evidence)
	return new_beliefs

	def belief_sequence(beliefs, observations):
	for observation in observations:
	beliefs = update(beliefs, observation)
	yield beliefs</pre></code>

	<p>And then, for example, if we keep observing children carrying the <em>a</em> alleles, we can compute exactly how increasingly confident we should be that both parents have the <em>a</em> allele:</p>

	<code><pre>>>> for belief_at_time in belief_sequence(prior_beliefs, ('a',)*6):
	... print(belief_at_time[AllelePairing('a', 'a')])
	...
	0.5
	0.6666666666666666
	0.8
	0.8888888888888888
	0.9411764705882353
	0.9696969696969698</pre></code>