Skip to content

Instantly share code, notes, and snippets.

@petered
Created May 8, 2018 12:23
Show Gist options
  • Save petered/0e42c9541fc13d8aba184d333c61d28d to your computer and use it in GitHub Desktop.
Save petered/0e42c9541fc13d8aba184d333c61d28d to your computer and use it in GitHub Desktop.
2018-05-04 draft
# Identities
**Delta, Sum**
\begin{align}
\Delta (x_t) &:= x_t - x_{t-1} \\
\Sigma (x_t) &:= s_t := s_{t-1} + x_t \\
\Sigma \circ \Delta &= \mathcal I
\end{align}
**Lambda-Enc, Lambda-Dec**
\begin{align}
\lambda enc (x_t) &:= \frac 1 \lambda x_t - \frac{1-\lambda}{\lambda} x_{t-1}\\
\lambda dec (x_t) &:= s_t := (1-\lambda) s_{t-1} + \lambda x_t \\
\lambda dec \circ \lambda enc &= \mathcal I
\end{align}
**Dec, Dec?**
\begin{align}
s_t &:= (1-\lambda) s_{t-1} + \lambda x_t \\
z_t &:= z_{t-1} + \epsilon (s_t - z_{t-1}) \\
&= (1-\epsilon) z_{t-1} + \epsilon s_t
\end{align}
So this become an alpha-kernel type thing.
**enc, sum?**
\begin{align}
a_t &:= \frac 1 \lambda x_t - \frac{1-\lambda}\lambda x_{t-1}\\
s_t &:= s_{t-1} + a_t \\
&:= \frac1\lambda x_t + \sum_{\tau=1}^{t-1} x_\tau + \frac{1-\lambda}{\lambda} x_0
\end{align}
Which is just a sum, with extra weight on the last element.
****
## Neural Networks as Dynamical Systems
Take a "forward pass":
$$
z = x \cdot w
$$
Can be interpreted as a special case of an "update" in a dynamical system:
$$
\dot z = x \cdot w - z
$$
Whose discrete time approximate update can be written as
\begin{aligned}
z_t &= z_{t-1} + \epsilon \dot z_{t-1} \\
&= z_{t-1} + \epsilon (x\cdot w - z_{t-1}) \\
\end{aligned}
Where $\epsilon:=\Delta t$ is the temporal spacing between updates. When $\epsilon=1$, we recover $z_t=x\cdot w$.
It can also be interpreted as the fixed point (the solution to $\dot z = 0$)
# Problem Statement
Suppose we have a bounded signal $x_t$
We want to approximate $x_t$ with a series of bits $b_t$
What do we really want to minimize?
We have inputs $x_t$.
\begin{align}
s_t^{smooth} &:= s_{t-1}^{smooth} + \epsilon\left(x_t \cdot w - s_{t-1}^{smooth} \right)\\
q_t &:= enc(x_t) & \in \{0, 1\}\\
s_t^{rough} &:= s_{t-1}^{rough} + \epsilon\left(dec(q_t \cdot w) - s_{t-1}^{rough} \right)\\
\mathcal L_t &= \| s_t^{smooth} - s_t^{rough} \|
\end{align}
How can we adjust $dec, enc, \epsilon$ to greedily minimize $\mathcal L_t$?
**Optimizing our current system**
\begin{align}
s_t^{smooth} &:= s_{t-1}^{smooth} + \epsilon\left(x_t \cdot w - s_{t-1}^{smooth} \right)\\
q_t &:= \left[\phi_{t-1} + \frac{1}{\lambda} x_t - \frac{1-\lambda}{\lambda} \cdot x_{t-1}>\frac12\right] \\
\phi_t &:= \phi_{t-1} - q_t \\
z_t &:= q_t \cdot w \\
s_t^{rough} &:= s_{t-1}^{rough} + \epsilon\left((1-\lambda) z_{t-1} + \lambda z_t - s_{t-1}^{rough} \right)\\
\mathcal L_t &= \| s_\infty^{smooth} - s_t^{rough} \|
\end{align}
# Experiments:
Figure 1: Predictive Coding
![](https://drive.google.com/uc?export=download&id=1H7Me0a7TcZbaZhUlELi6czQ2dAI2i8iQ)
Figure 2: Step-size ($\epsilon$) Annealing
![](https://drive.google.com/uc?export=download&id=1gxjN8C8lMqcv870xGjaiYSaPDg7rweCw)
Figure 3: Predictive Coding With Epsilon Annealing
![](https://drive.google.com/uc?export=download&id=1mgrE3Yt3QZFrdiVPz2eRfTiPt5SliUWv)
Figure 4: Predictive Coding Annealing
![](https://drive.google.com/uc?export=download&id=1QbT4LKdS-EH9IxDUrgrz9UtTKQz7CXqg)
Figure 5: Annealing Both:
![](https://drive.google.com/uc?export=download&id=1QpTb3L79hRoCNlVXSqk0AY7kfxPLjDCM)
Figure 6: The right quantizer
![](https://drive.google.com/uc?export=download&id=1MfJoYhbkEFC7x_dLMB3Sh7mQa1E4HVk8)
# An adaptive encoder
The final encoder/decoder in "predictive coding annealing" gives us some lessons:
Figure 3 shows us that we achieve our best convergence when interpolating between
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment