petered · February 15, 2018 15:29
diff --git a/temporal-basis-funcs b/temporal-basis-funcs
 $$
 \newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
 \newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}
 \newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}
 \newcommand{\pderivgiven}[3]{\left.\frac{\partial #1}{\partial #2}\right|_{#3}}
 \newcommand{\norm}[1]{\frac12\| #1 \|_2^2}
 \newcommand{argmax}[1]{\underset{#1}{\operatorname{argmax}}}
 \newcommand{argmin}[1]{\underset{#1}{\operatorname{argmin}}}
 \newcommand{blue}[1]{\color{blue}{#1}}
 \newcommand{red}[1]{\color{red}{#1}}
 \newcommand{\numel}[1]{|#1|}
 \newcommand{\switch}[3]{\begin{cases} #2 & \text{if } {#1} \\ #3 &\text{otherwise}\end{cases}}
 \newcommand{\pderivdim}[4]{\overset{\big[#3 \times #4 \big]}{\frac{\partial #1}{\partial #2}}}
 $$

 # Temporal Basis Functions for Event-Based learning

 Inspiration:

 - Predictive Coding - Higher layers model only what lower layers can't explain.  [The PredNet paper](https://arxiv.org/abs/1605.08104) showed that this idea can work quite well.
 - [Spiking Boltzmann Machines](http://www.cs.toronto.edu/~fritz/absps/nips00-ab.pdf) - A neuron's parameters represent a function to be added to the energy landscape, and its activation represents the scaling of this function.   
 - [Slow Features](https://arxiv.org/pdf/1605.06336.pdf) - Try to model temporal data as a nonlinear combination of slowly-changing signals.

 ## Event-based learning

 Suppose we'd like to model a signal $x(t)\in \mathbb R^D$, based on temporally ordered, arbitrarily timed *observations* $(x_n, t_n)$.

 We assume these are observations of some time-continuous function $x(t)$, such that:
 $$
 x_n := x(t_n)
 $$

 Our goal will be to try and model a time-varying distribution X(t) based on updates $x_t$.

 ## Approach

 Let's further suppose that we have a collection of temporal "basis" signals $f_i(t)$, where we want to represent our signal as a combination of these functions:

 $$
 p(X(t)) \propto \exp\left( \sum_i c^i(t) (f_\theta^i(t) - X(t))^2 \right)
 $$

 Where $c^i(t)$ are coefficients determining, at a given time, to what extent $X(t)$ is likely to be produced by the corresponding signal $f_{\theta}^i(t)$.  Every time a signal comes in, we update:

 $\theta$ - The parameters of the generating functions 
 $c^i$ - The coefficients determining how strong is the effect of each of these.

 At the next layer, we would model $c^i(t)$ as we did $X(t)$ for the last layer.
	$$
	\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
	\newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}
	\newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}
	\newcommand{\pderivgiven}[3]{\left.\frac{\partial #1}{\partial #2}\right\|_{#3}}
	\newcommand{\norm}[1]{\frac12\\| #1 \\|_2^2}
	\newcommand{argmax}[1]{\underset{#1}{\operatorname{argmax}}}
	\newcommand{argmin}[1]{\underset{#1}{\operatorname{argmin}}}
	\newcommand{blue}[1]{\color{blue}{#1}}
	\newcommand{red}[1]{\color{red}{#1}}
	\newcommand{\numel}[1]{\|#1\|}
	\newcommand{\switch}[3]{\begin{cases} #2 & \text{if } {#1} \\ #3 &\text{otherwise}\end{cases}}
	\newcommand{\pderivdim}[4]{\overset{\big[#3 \times #4 \big]}{\frac{\partial #1}{\partial #2}}}
	$$

	# Temporal Basis Functions for Event-Based learning

	Inspiration:

	- Predictive Coding - Higher layers model only what lower layers can't explain. [The PredNet paper](https://arxiv.org/abs/1605.08104) showed that this idea can work quite well.
	- [Spiking Boltzmann Machines](http://www.cs.toronto.edu/~fritz/absps/nips00-ab.pdf) - A neuron's parameters represent a function to be added to the energy landscape, and its activation represents the scaling of this function.
	- [Slow Features](https://arxiv.org/pdf/1605.06336.pdf) - Try to model temporal data as a nonlinear combination of slowly-changing signals.

	## Event-based learning

	Suppose we'd like to model a signal $x(t)\in \mathbb R^D$, based on temporally ordered, arbitrarily timed observations $(x_n, t_n)$.

	We assume these are observations of some time-continuous function $x(t)$, such that:
	$$
	x_n := x(t_n)
	$$

	Our goal will be to try and model a time-varying distribution X(t) based on updates $x_t$.

	## Approach

	Let's further suppose that we have a collection of temporal "basis" signals $f_i(t)$, where we want to represent our signal as a combination of these functions:

	$$
	p(X(t)) \propto \exp\left( \sum_i c^i(t) (f_\theta^i(t) - X(t))^2 \right)
	$$

	Where $c^i(t)$ are coefficients determining, at a given time, to what extent $X(t)$ is likely to be produced by the corresponding signal $f_{\theta}^i(t)$. Every time a signal comes in, we update:

	$\theta$ - The parameters of the generating functions
	$c^i$ - The coefficients determining how strong is the effect of each of these.

	At the next layer, we would model $c^i(t)$ as we did $X(t)$ for the last layer.