petered · October 12, 2017 15:37
diff --git a/kasper-project b/kasper-project


 # 1) Simple Maximum Likelihood 

    F --> X

 $$
 p(F=1 | X=x) = \frac{p(X=x|F=1) p(F=1)}{p(X=x)} = \frac{p(X=x|F=1) p(F=1)}{p(X=x|F=0)p(F=0) + p(X=x|F=1)p(F=1)}
 $$

 Now, you've learned $p(F)$ and $p(X|F)$ over the dataset, so we can solve for this.

 # 2) C causes X and F: p(X, F | C) = p(X|C) p(F|C)

    C --> F 
    '---> X
    
 Train naive-bayes EM by just concatenating F onto X, then marginalizing out C at test time.  Then in your EM model, you've trained $p(F|C)$,  $p(C)$, $p(X|C)$.  And you can infer:

 \begin{align}
 p(F|X) &= \sum_{c\in |C|} p(F, C| X) \\
 &= \sum_C p(F, C, X)/p(X) \\
 &= \sum_C p(F|C) p(X|C)p(C) /p(X)  \text{... Because of the graph structure}\\
 &= \sum_C p(F|C) p(C|X)  \text{... Bayes rule}\\
 &=  \frac{\sum_C p(F|C) p(X|C)p(C)}{\sum_{C}p(X|C)p(C) }
 \end{align}

 **Using past customer data**.
 In your "averaging" assumption, you average out the latent distributions of transactions for a given customer.

 \begin{align}
 \hat p(C|X=x) &= \frac1N \sum_{x': x'_{id}=x_{id}} p(C|X=x') \\
 &= \frac1N \sum_{x': x'_{id}=x_{id}} \frac{p(X=x'|C)p(C)}{P(X=x')}\\
 &= \frac1N \sum_{x': x'_{id}=x_{id}} \frac{p(X=x'|C)p(C)}{\sum_C(X=x'|C)p(C)}\\
 \end{align}

 So you could plug this in to the above equation:

 \begin{align}
 p(F=1|X=x) &= \sum_{c\in |C|} p(F=1|C=c)\hat p(C=c|X=x)
 \end{align}




 #3) F and C cause X

   C --> X <-- F

 You need to define $p(X|C,F)$ now, which we discussed before.

 I think 

 $$

 $$

 ... TODO: Fill in.


	# 1) Simple Maximum Likelihood

	F --> X

	$$
	p(F=1 \| X=x) = \frac{p(X=x\|F=1) p(F=1)}{p(X=x)} = \frac{p(X=x\|F=1) p(F=1)}{p(X=x\|F=0)p(F=0) + p(X=x\|F=1)p(F=1)}
	$$

	Now, you've learned $p(F)$ and $p(X\|F)$ over the dataset, so we can solve for this.

	# 2) C causes X and F: p(X, F \| C) = p(X\|C) p(F\|C)

	C --> F
	'---> X

	Train naive-bayes EM by just concatenating F onto X, then marginalizing out C at test time. Then in your EM model, you've trained $p(F\|C)$, $p(C)$, $p(X\|C)$. And you can infer:

	\begin{align}
	p(F\|X) &= \sum_{c\in \|C\|} p(F, C\| X) \\
	&= \sum_C p(F, C, X)/p(X) \\
	&= \sum_C p(F\|C) p(X\|C)p(C) /p(X) \text{... Because of the graph structure}\\
	&= \sum_C p(F\|C) p(C\|X) \text{... Bayes rule}\\
	&= \frac{\sum_C p(F\|C) p(X\|C)p(C)}{\sum_{C}p(X\|C)p(C) }
	\end{align}

	Using past customer data.
	In your "averaging" assumption, you average out the latent distributions of transactions for a given customer.

	\begin{align}
	\hat p(C\|X=x) &= \frac1N \sum_{x': x'_{id}=x_{id}} p(C\|X=x') \\
	&= \frac1N \sum_{x': x'_{id}=x_{id}} \frac{p(X=x'\|C)p(C)}{P(X=x')}\\
	&= \frac1N \sum_{x': x'_{id}=x_{id}} \frac{p(X=x'\|C)p(C)}{\sum_C(X=x'\|C)p(C)}\\
	\end{align}

	So you could plug this in to the above equation:

	\begin{align}
	p(F=1\|X=x) &= \sum_{c\in \|C\|} p(F=1\|C=c)\hat p(C=c\|X=x)
	\end{align}




	#3) F and C cause X

	C --> X <-- F

	You need to define $p(X\|C,F)$ now, which we discussed before.

	I think

	$$

	$$

	... TODO: Fill in.