drdozer · March 1, 2017 17:53
diff --git a/it didn't happen b/it didn't happen
 The question we're answering here is how many times we need to try something
 to have some level of certainty that it will happen at least once.
 For example, if a parent has children, each with independent mutations,
 how many children are needed until the chances of at least one of those children carrying a particular mutation exceeds 90%.

 First some notation.

 p(e) :: probability of some event e happening in a single trial

 p(¬e) :: probability that some event e does not happen

 And as either something happens or it does not,

 p(e) + p(¬e) = 1
  =>
 p(¬e) = 1-p(e)

 Now let's say we do something lots of times:

 p(n of e) :: probability of n independent events e

 As the events are independent, the chances of them all happening is their product.
 As we are taking the product of n things, all alike, we can rewrite this as a power.

 p(n of e) = p(e)^n

 We want to know how many events we need to look at, to expect to have a 50% chance (or some other chance) of observing at least one example of e happening.
 The problem with this is that it may have happened once, or twice, or more times in those trials.
 This sum gets quite complicated quite quickly.

 Luckily though, is the same as asking how many events we need to look at, to have a (100% - 50% = 50%) chance of not observing any examples of e happening.
 We can calculate this easily.
 It's the chance that it didn't happen the first time, and didn't happen the second, and so on.

 Given p(e), and a threshold c:

 c = p(at least one e from n trials),
   ¬c is p(no e from n trials)
 ¬c = p(n of ¬e)
   from independence of events:
   = p(¬e)^n
 We want to solve for n.
  log both sides
 ln(p(¬e)) * n = ln(¬c)
  divide through
 n = ln(¬c) / ln(p(¬e))
  removing ¬ by complementary probabilities
 n = ln(1 - c) / ln(1 - p(e))

 Let's try it with some simple examples.
 Say we have a fair coin.
 So we can ask how many throws of that coin we need to get to expect a 50% chane that it will come up heads.
 Now, we know directly that the answer is 1 throw, as it is a fair coin with just two outcomes one of which we are selecting (1/2=50%).
 However, we can also show it with this formula:

 n = ln(1 - c) / ln(1 - p(e))
  c = 50%, p(e) = 50%
  = ln(1-50%) / ln(1-50%)
  x/x = 1
  = 1

 So, in a long-winded way, we've validated that a fair dice needs to be thrown once to have a 50% chance of coming up heads.

 How many times must you roll a dice to have a 90% chance of a 6?

 n = ln(1 - c) / ln(1 - p(e))
  c = 90%, p(e) = 1/6
  = ln(1 - 90%) / ln(1-1/6)
  arithmetic
  = ln(10%) / ln(5/6)
  and using a calculator
  ~ 12.6

 So if you roll a dice 12-13 times, you have a 90% chance of getting at least one 6.

 Now, let's say we have a mutation that has a 1/16 chance of happening in any child, and we want to ask how many children a parent must have
 before they have a 90% chance of at last one child carrying this mutation, we work it out in exactly the same way:

 n = ln(1 - c) / ln(1 - p(e))
  c = 90%, p(e) = 1/16
  ~35.7
  
 So with 36 children, there's a greater than 90% chance that at least one child will carry that mutation.

 In case you have scala installed, here's the scala code for calculating this:

 // c_threshold - chance threshold
 // p_e - probability of event e in a single trial
 // returns the number of trials needed for the combined probability of e happening at least once
 // to reach c_threshold
 def estimateN(c_threshold: Double, p_e: Double) = Math.log(1.0 - c_threshold) / Math.log(1.0-p_e)
	The question we're answering here is how many times we need to try something
	to have some level of certainty that it will happen at least once.
	For example, if a parent has children, each with independent mutations,
	how many children are needed until the chances of at least one of those children carrying a particular mutation exceeds 90%.

	First some notation.

	p(e) :: probability of some event e happening in a single trial

	p(¬e) :: probability that some event e does not happen

	And as either something happens or it does not,

	p(e) + p(¬e) = 1
	=>
	p(¬e) = 1-p(e)

	Now let's say we do something lots of times:

	p(n of e) :: probability of n independent events e

	As the events are independent, the chances of them all happening is their product.
	As we are taking the product of n things, all alike, we can rewrite this as a power.

	p(n of e) = p(e)^n

	We want to know how many events we need to look at, to expect to have a 50% chance (or some other chance) of observing at least one example of e happening.
	The problem with this is that it may have happened once, or twice, or more times in those trials.
	This sum gets quite complicated quite quickly.

	Luckily though, is the same as asking how many events we need to look at, to have a (100% - 50% = 50%) chance of not observing any examples of e happening.
	We can calculate this easily.
	It's the chance that it didn't happen the first time, and didn't happen the second, and so on.

	Given p(e), and a threshold c:

	c = p(at least one e from n trials),
	¬c is p(no e from n trials)
	¬c = p(n of ¬e)
	from independence of events:
	= p(¬e)^n
	We want to solve for n.
	log both sides
	ln(p(¬e)) * n = ln(¬c)
	divide through
	n = ln(¬c) / ln(p(¬e))
	removing ¬ by complementary probabilities
	n = ln(1 - c) / ln(1 - p(e))

	Let's try it with some simple examples.
	Say we have a fair coin.
	So we can ask how many throws of that coin we need to get to expect a 50% chane that it will come up heads.
	Now, we know directly that the answer is 1 throw, as it is a fair coin with just two outcomes one of which we are selecting (1/2=50%).
	However, we can also show it with this formula:

	n = ln(1 - c) / ln(1 - p(e))
	c = 50%, p(e) = 50%
	= ln(1-50%) / ln(1-50%)
	x/x = 1
	= 1

	So, in a long-winded way, we've validated that a fair dice needs to be thrown once to have a 50% chance of coming up heads.

	How many times must you roll a dice to have a 90% chance of a 6?

	n = ln(1 - c) / ln(1 - p(e))
	c = 90%, p(e) = 1/6
	= ln(1 - 90%) / ln(1-1/6)
	arithmetic
	= ln(10%) / ln(5/6)
	and using a calculator
	~ 12.6

	So if you roll a dice 12-13 times, you have a 90% chance of getting at least one 6.

	Now, let's say we have a mutation that has a 1/16 chance of happening in any child, and we want to ask how many children a parent must have
	before they have a 90% chance of at last one child carrying this mutation, we work it out in exactly the same way:

	n = ln(1 - c) / ln(1 - p(e))
	c = 90%, p(e) = 1/16
	~35.7

	So with 36 children, there's a greater than 90% chance that at least one child will carry that mutation.

	In case you have scala installed, here's the scala code for calculating this:

	// c_threshold - chance threshold
	// p_e - probability of event e in a single trial
	// returns the number of trials needed for the combined probability of e happening at least once
	// to reach c_threshold
	def estimateN(c_threshold: Double, p_e: Double) = Math.log(1.0 - c_threshold) / Math.log(1.0-p_e)