petered · May 8, 2018 12:23
diff --git a/2018-05-04 meeting b/2018-05-04 meeting

 # Identities

 **Delta, Sum**
 \begin{align}
 \Delta (x_t) &:= x_t - x_{t-1} \\
 \Sigma (x_t) &:= s_t := s_{t-1} + x_t \\
 \Sigma \circ \Delta  &= \mathcal I
 \end{align}

 **Lambda-Enc, Lambda-Dec**
 \begin{align}
 \lambda enc (x_t) &:= \frac 1 \lambda x_t - \frac{1-\lambda}{\lambda} x_{t-1}\\
 \lambda dec  (x_t) &:= s_t := (1-\lambda) s_{t-1} + \lambda x_t \\
 \lambda dec \circ \lambda enc &= \mathcal I
 \end{align}

 **Dec, Dec?**
 \begin{align}
 s_t &:= (1-\lambda) s_{t-1} + \lambda x_t \\
 z_t &:= z_{t-1} + \epsilon (s_t - z_{t-1}) \\
 &= (1-\epsilon) z_{t-1} + \epsilon s_t 
 \end{align}

 So this become an alpha-kernel type thing.  

 **enc, sum?**
 \begin{align}
 a_t &:= \frac 1 \lambda x_t - \frac{1-\lambda}\lambda x_{t-1}\\
 s_t &:= s_{t-1} + a_t \\
 &:= \frac1\lambda x_t + \sum_{\tau=1}^{t-1} x_\tau + \frac{1-\lambda}{\lambda} x_0
 \end{align}

 Which is just a sum, with extra weight on the last element.








 ****


 ## Neural Networks as Dynamical Systems

 Take a "forward pass":

 $$
 z = x \cdot w
 $$

 Can be interpreted as a special case of an "update" in a dynamical system:

 $$
 \dot z = x \cdot w - z 
 $$

 Whose discrete time approximate update can be written as 
 \begin{aligned}
 z_t &= z_{t-1} + \epsilon \dot z_{t-1} \\
 &= z_{t-1} + \epsilon (x\cdot w - z_{t-1}) \\
 \end{aligned}

 Where  $\epsilon:=\Delta t$ is the temporal spacing between updates.  When $\epsilon=1$, we recover $z_t=x\cdot w$.  

 It can also be interpreted as the fixed point (the solution to $\dot z = 0$)

 # Problem Statement

 Suppose we have a bounded signal $x_t$

 We want to approximate $x_t$ with a series of bits $b_t$ 

 What do we really want to minimize?

 We have inputs $x_t$.  

 \begin{align}
 s_t^{smooth} &:= s_{t-1}^{smooth} + \epsilon\left(x_t \cdot w - s_{t-1}^{smooth} \right)\\
 q_t &:= enc(x_t) & \in \{0, 1\}\\
 s_t^{rough} &:= s_{t-1}^{rough} + \epsilon\left(dec(q_t \cdot w) - s_{t-1}^{rough} \right)\\
 \mathcal L_t &= \|  s_t^{smooth} - s_t^{rough} \|
 \end{align}

 How can we adjust $dec, enc, \epsilon$ to greedily minimize $\mathcal L_t$? 


 **Optimizing our current system**

 \begin{align}
 s_t^{smooth} &:= s_{t-1}^{smooth} + \epsilon\left(x_t \cdot w - s_{t-1}^{smooth} \right)\\
 q_t &:= \left[\phi_{t-1} + \frac{1}{\lambda} x_t - \frac{1-\lambda}{\lambda} \cdot x_{t-1}>\frac12\right] \\
 \phi_t &:= \phi_{t-1} - q_t \\
 z_t &:= q_t \cdot w \\
 s_t^{rough} &:= s_{t-1}^{rough} + \epsilon\left((1-\lambda) z_{t-1} + \lambda z_t - s_{t-1}^{rough} \right)\\
 \mathcal L_t &= \|  s_\infty^{smooth} - s_t^{rough} \|
 \end{align}


 # Experiments:

 Figure 1: Predictive Coding
 ![](https://drive.google.com/uc?export=download&id=1H7Me0a7TcZbaZhUlELi6czQ2dAI2i8iQ)

 Figure 2: Step-size ($\epsilon$) Annealing
 ![](https://drive.google.com/uc?export=download&id=1gxjN8C8lMqcv870xGjaiYSaPDg7rweCw)

 Figure 3: Predictive Coding With Epsilon Annealing
 ![](https://drive.google.com/uc?export=download&id=1mgrE3Yt3QZFrdiVPz2eRfTiPt5SliUWv)

 Figure 4: Predictive Coding Annealing
 ![](https://drive.google.com/uc?export=download&id=1QbT4LKdS-EH9IxDUrgrz9UtTKQz7CXqg)

 Figure 5: Annealing Both:
 ![](https://drive.google.com/uc?export=download&id=1QpTb3L79hRoCNlVXSqk0AY7kfxPLjDCM)

 Figure 6: The right quantizer
 ![](https://drive.google.com/uc?export=download&id=1MfJoYhbkEFC7x_dLMB3Sh7mQa1E4HVk8)

 # An adaptive encoder

 The final encoder/decoder in "predictive coding annealing" gives us some lessons:

 Figure 3 shows us that we achieve our best convergence when interpolating between

	# Identities

	Delta, Sum
	\begin{align}
	\Delta (x_t) &:= x_t - x_{t-1} \\
	\Sigma (x_t) &:= s_t := s_{t-1} + x_t \\
	\Sigma \circ \Delta &= \mathcal I
	\end{align}

	Lambda-Enc, Lambda-Dec
	\begin{align}
	\lambda enc (x_t) &:= \frac 1 \lambda x_t - \frac{1-\lambda}{\lambda} x_{t-1}\\
	\lambda dec (x_t) &:= s_t := (1-\lambda) s_{t-1} + \lambda x_t \\
	\lambda dec \circ \lambda enc &= \mathcal I
	\end{align}

	Dec, Dec?
	\begin{align}
	s_t &:= (1-\lambda) s_{t-1} + \lambda x_t \\
	z_t &:= z_{t-1} + \epsilon (s_t - z_{t-1}) \\
	&= (1-\epsilon) z_{t-1} + \epsilon s_t
	\end{align}

	So this become an alpha-kernel type thing.

	enc, sum?
	\begin{align}
	a_t &:= \frac 1 \lambda x_t - \frac{1-\lambda}\lambda x_{t-1}\\
	s_t &:= s_{t-1} + a_t \\
	&:= \frac1\lambda x_t + \sum_{\tau=1}^{t-1} x_\tau + \frac{1-\lambda}{\lambda} x_0
	\end{align}

	Which is just a sum, with extra weight on the last element.








	****


	## Neural Networks as Dynamical Systems

	Take a "forward pass":

	$$
	z = x \cdot w
	$$

	Can be interpreted as a special case of an "update" in a dynamical system:

	$$
	\dot z = x \cdot w - z
	$$

	Whose discrete time approximate update can be written as
	\begin{aligned}
	z_t &= z_{t-1} + \epsilon \dot z_{t-1} \\
	&= z_{t-1} + \epsilon (x\cdot w - z_{t-1}) \\
	\end{aligned}

	Where $\epsilon:=\Delta t$ is the temporal spacing between updates. When $\epsilon=1$, we recover $z_t=x\cdot w$.

	It can also be interpreted as the fixed point (the solution to $\dot z = 0$)

	# Problem Statement

	Suppose we have a bounded signal $x_t$

	We want to approximate $x_t$ with a series of bits $b_t$

	What do we really want to minimize?

	We have inputs $x_t$.

	\begin{align}
	s_t^{smooth} &:= s_{t-1}^{smooth} + \epsilon\left(x_t \cdot w - s_{t-1}^{smooth} \right)\\
	q_t &:= enc(x_t) & \in \{0, 1\}\\
	s_t^{rough} &:= s_{t-1}^{rough} + \epsilon\left(dec(q_t \cdot w) - s_{t-1}^{rough} \right)\\
	\mathcal L_t &= \\| s_t^{smooth} - s_t^{rough} \\|
	\end{align}

	How can we adjust $dec, enc, \epsilon$ to greedily minimize $\mathcal L_t$?


	Optimizing our current system

	\begin{align}
	s_t^{smooth} &:= s_{t-1}^{smooth} + \epsilon\left(x_t \cdot w - s_{t-1}^{smooth} \right)\\
	q_t &:= \left[\phi_{t-1} + \frac{1}{\lambda} x_t - \frac{1-\lambda}{\lambda} \cdot x_{t-1}>\frac12\right] \\
	\phi_t &:= \phi_{t-1} - q_t \\
	z_t &:= q_t \cdot w \\
	s_t^{rough} &:= s_{t-1}^{rough} + \epsilon\left((1-\lambda) z_{t-1} + \lambda z_t - s_{t-1}^{rough} \right)\\
	\mathcal L_t &= \\| s_\infty^{smooth} - s_t^{rough} \\|
	\end{align}


	# Experiments:

	Figure 1: Predictive Coding
	![](https://drive.google.com/uc?export=download&id=1H7Me0a7TcZbaZhUlELi6czQ2dAI2i8iQ)

	Figure 2: Step-size ($\epsilon$) Annealing
	![](https://drive.google.com/uc?export=download&id=1gxjN8C8lMqcv870xGjaiYSaPDg7rweCw)

	Figure 3: Predictive Coding With Epsilon Annealing
	![](https://drive.google.com/uc?export=download&id=1mgrE3Yt3QZFrdiVPz2eRfTiPt5SliUWv)

	Figure 4: Predictive Coding Annealing
	![](https://drive.google.com/uc?export=download&id=1QbT4LKdS-EH9IxDUrgrz9UtTKQz7CXqg)

	Figure 5: Annealing Both:
	![](https://drive.google.com/uc?export=download&id=1QpTb3L79hRoCNlVXSqk0AY7kfxPLjDCM)

	Figure 6: The right quantizer
	![](https://drive.google.com/uc?export=download&id=1MfJoYhbkEFC7x_dLMB3Sh7mQa1E4HVk8)

	# An adaptive encoder

	The final encoder/decoder in "predictive coding annealing" gives us some lessons:

	Figure 3 shows us that we achieve our best convergence when interpolating between