petered · December 12, 2017 23:41
diff --git a/kasper-em-on-graph b/kasper-em-on-graph
 $$
 \newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
 \newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}
 \newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}
 \newcommand{\pderivgiven}[3]{\left.\frac{\partial #1}{\partial #2}\right|_{#3}}
 \newcommand{\norm}[1]{\frac12\| #1 \|_2^2}
 \newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
 \newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
 \newcommand{\blue}[1]{\color{blue}{#1}}
 \newcommand{\red}[1]{\color{red}{#1}}
 \newcommand{\numel}[1]{|#1|}
 \newcommand{\switch}[3]{\begin{cases} #2 & \text{if } {#1} \\ #3 &\text{otherwise}\end{cases}}
 \newcommand{\pderivdim}[4]{\overset{\big[#3 \times #4 \big]}{\frac{\partial #1}{\partial #2}}}
 \newcommand{\softmax}{\operatorname{softmax}}
 \newcommand{\Bern}{\operatorname{Bern}}
 \newcommand{\Cat}{\operatorname{Cat}}
 \newcommand{\sigm}{\operatorname{sigm}}
 \newcommand{\logfrac}[2]{\log \left( \frac{#1}{#2} \right)}
 $$


 We've assumed the following graphical model:

 ![enter image description here](https://docs.google.com/drawings/d/e/2PACX-1vRal4bK4gu7zruAjhV3R0CvjpqDP9sbAUHGop1FojAeaJnZmedx6bwoBQY762f-MTnWuuOkpyCoG8DX/pub?w=186&h=213)

 This graph tells us that we can factorize our distribution as:

 \begin{align}
 p(X, C, F, ID; \theta)=p(X|C, F;\theta) p(C|ID;\theta) p(ID;\theta) p(F;\theta)
 \end{align}

 (Where we use $\theta$ to summarize all model parameters)


 Now.  How do we do EM on such a model?

 **E-Step**
 Find "responsibilities": $p(C | X, F, ID; \theta_{old})$

 Using Bayes Rule, and looking at the dependencies in our graph, we can rewrite this so that we can directly solve for all the terms.

 \begin{align}
 p(C | X, F, ID; \theta_{old}) &= \frac{p(X, C, F, ID; \theta_{old})}{p(X, F, ID;\theta_{old})} \\
 &= \frac{p(X, C, F, ID; \theta_{old})}{\sum_{c\in |C|}p(c, X, F, ID;\theta_{old})} \\
 &= \frac{p(X|C, F;\theta_{old}) p(C|ID;\theta_{old}) p(ID;\theta_{old}) p(F;\theta_{old})}{\sum_{c\in |C|}p(X|C, F;\theta_{old}) p(C|ID;\theta_{old}) p(ID;\theta_{old}) p(F;\theta_{old})} \\
 &:= \gamma(c)
 \end{align}


 **M-Step** 
 Maximize parameters: 

 \begin{align}
 \theta_{new} &\leftarrow \argmax{\theta} \sum_{c \in |C|} p(C=c | X, F, ID; \theta_{old}) p(X, C=c, F, ID; \theta) \\
 &=\argmax{\theta} \sum_{c \in |C|} \gamma(c) p(X, C=c, F, ID; \theta) \\
 &=\argmax{\theta} \sum_{c \in |C|} \gamma(c) p(X|C=c, F;\theta) p(C=c|ID;\theta) p(ID;\theta) p(F;\theta)
 \end{align}
	$$
	\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
	\newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}
	\newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}
	\newcommand{\pderivgiven}[3]{\left.\frac{\partial #1}{\partial #2}\right\|_{#3}}
	\newcommand{\norm}[1]{\frac12\\| #1 \\|_2^2}
	\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
	\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
	\newcommand{\blue}[1]{\color{blue}{#1}}
	\newcommand{\red}[1]{\color{red}{#1}}
	\newcommand{\numel}[1]{\|#1\|}
	\newcommand{\switch}[3]{\begin{cases} #2 & \text{if } {#1} \\ #3 &\text{otherwise}\end{cases}}
	\newcommand{\pderivdim}[4]{\overset{\big[#3 \times #4 \big]}{\frac{\partial #1}{\partial #2}}}
	\newcommand{\softmax}{\operatorname{softmax}}
	\newcommand{\Bern}{\operatorname{Bern}}
	\newcommand{\Cat}{\operatorname{Cat}}
	\newcommand{\sigm}{\operatorname{sigm}}
	\newcommand{\logfrac}[2]{\log \left( \frac{#1}{#2} \right)}
	$$


	We've assumed the following graphical model:

	![enter image description here](https://docs.google.com/drawings/d/e/2PACX-1vRal4bK4gu7zruAjhV3R0CvjpqDP9sbAUHGop1FojAeaJnZmedx6bwoBQY762f-MTnWuuOkpyCoG8DX/pub?w=186&h=213)

	This graph tells us that we can factorize our distribution as:

	\begin{align}
	p(X, C, F, ID; \theta)=p(X\|C, F;\theta) p(C\|ID;\theta) p(ID;\theta) p(F;\theta)
	\end{align}

	(Where we use $\theta$ to summarize all model parameters)


	Now. How do we do EM on such a model?

	E-Step
	Find "responsibilities": $p(C \| X, F, ID; \theta_{old})$

	Using Bayes Rule, and looking at the dependencies in our graph, we can rewrite this so that we can directly solve for all the terms.

	\begin{align}
	p(C \| X, F, ID; \theta_{old}) &= \frac{p(X, C, F, ID; \theta_{old})}{p(X, F, ID;\theta_{old})} \\
	&= \frac{p(X, C, F, ID; \theta_{old})}{\sum_{c\in \|C\|}p(c, X, F, ID;\theta_{old})} \\
	&= \frac{p(X\|C, F;\theta_{old}) p(C\|ID;\theta_{old}) p(ID;\theta_{old}) p(F;\theta_{old})}{\sum_{c\in \|C\|}p(X\|C, F;\theta_{old}) p(C\|ID;\theta_{old}) p(ID;\theta_{old}) p(F;\theta_{old})} \\
	&:= \gamma(c)
	\end{align}


	M-Step
	Maximize parameters:

	\begin{align}
	\theta_{new} &\leftarrow \argmax{\theta} \sum_{c \in \|C\|} p(C=c \| X, F, ID; \theta_{old}) p(X, C=c, F, ID; \theta) \\
	&=\argmax{\theta} \sum_{c \in \|C\|} \gamma(c) p(X, C=c, F, ID; \theta) \\
	&=\argmax{\theta} \sum_{c \in \|C\|} \gamma(c) p(X\|C=c, F;\theta) p(C=c\|ID;\theta) p(ID;\theta) p(F;\theta)
	\end{align}