Created
May 8, 2018 12:23
-
-
Save petered/0e42c9541fc13d8aba184d333c61d28d to your computer and use it in GitHub Desktop.
2018-05-04 draft
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Identities | |
**Delta, Sum** | |
\begin{align} | |
\Delta (x_t) &:= x_t - x_{t-1} \\ | |
\Sigma (x_t) &:= s_t := s_{t-1} + x_t \\ | |
\Sigma \circ \Delta &= \mathcal I | |
\end{align} | |
**Lambda-Enc, Lambda-Dec** | |
\begin{align} | |
\lambda enc (x_t) &:= \frac 1 \lambda x_t - \frac{1-\lambda}{\lambda} x_{t-1}\\ | |
\lambda dec (x_t) &:= s_t := (1-\lambda) s_{t-1} + \lambda x_t \\ | |
\lambda dec \circ \lambda enc &= \mathcal I | |
\end{align} | |
**Dec, Dec?** | |
\begin{align} | |
s_t &:= (1-\lambda) s_{t-1} + \lambda x_t \\ | |
z_t &:= z_{t-1} + \epsilon (s_t - z_{t-1}) \\ | |
&= (1-\epsilon) z_{t-1} + \epsilon s_t | |
\end{align} | |
So this become an alpha-kernel type thing. | |
**enc, sum?** | |
\begin{align} | |
a_t &:= \frac 1 \lambda x_t - \frac{1-\lambda}\lambda x_{t-1}\\ | |
s_t &:= s_{t-1} + a_t \\ | |
&:= \frac1\lambda x_t + \sum_{\tau=1}^{t-1} x_\tau + \frac{1-\lambda}{\lambda} x_0 | |
\end{align} | |
Which is just a sum, with extra weight on the last element. | |
**** | |
## Neural Networks as Dynamical Systems | |
Take a "forward pass": | |
$$ | |
z = x \cdot w | |
$$ | |
Can be interpreted as a special case of an "update" in a dynamical system: | |
$$ | |
\dot z = x \cdot w - z | |
$$ | |
Whose discrete time approximate update can be written as | |
\begin{aligned} | |
z_t &= z_{t-1} + \epsilon \dot z_{t-1} \\ | |
&= z_{t-1} + \epsilon (x\cdot w - z_{t-1}) \\ | |
\end{aligned} | |
Where $\epsilon:=\Delta t$ is the temporal spacing between updates. When $\epsilon=1$, we recover $z_t=x\cdot w$. | |
It can also be interpreted as the fixed point (the solution to $\dot z = 0$) | |
# Problem Statement | |
Suppose we have a bounded signal $x_t$ | |
We want to approximate $x_t$ with a series of bits $b_t$ | |
What do we really want to minimize? | |
We have inputs $x_t$. | |
\begin{align} | |
s_t^{smooth} &:= s_{t-1}^{smooth} + \epsilon\left(x_t \cdot w - s_{t-1}^{smooth} \right)\\ | |
q_t &:= enc(x_t) & \in \{0, 1\}\\ | |
s_t^{rough} &:= s_{t-1}^{rough} + \epsilon\left(dec(q_t \cdot w) - s_{t-1}^{rough} \right)\\ | |
\mathcal L_t &= \| s_t^{smooth} - s_t^{rough} \| | |
\end{align} | |
How can we adjust $dec, enc, \epsilon$ to greedily minimize $\mathcal L_t$? | |
**Optimizing our current system** | |
\begin{align} | |
s_t^{smooth} &:= s_{t-1}^{smooth} + \epsilon\left(x_t \cdot w - s_{t-1}^{smooth} \right)\\ | |
q_t &:= \left[\phi_{t-1} + \frac{1}{\lambda} x_t - \frac{1-\lambda}{\lambda} \cdot x_{t-1}>\frac12\right] \\ | |
\phi_t &:= \phi_{t-1} - q_t \\ | |
z_t &:= q_t \cdot w \\ | |
s_t^{rough} &:= s_{t-1}^{rough} + \epsilon\left((1-\lambda) z_{t-1} + \lambda z_t - s_{t-1}^{rough} \right)\\ | |
\mathcal L_t &= \| s_\infty^{smooth} - s_t^{rough} \| | |
\end{align} | |
# Experiments: | |
Figure 1: Predictive Coding | |
 | |
Figure 2: Step-size ($\epsilon$) Annealing | |
 | |
Figure 3: Predictive Coding With Epsilon Annealing | |
 | |
Figure 4: Predictive Coding Annealing | |
 | |
Figure 5: Annealing Both: | |
 | |
Figure 6: The right quantizer | |
 | |
# An adaptive encoder | |
The final encoder/decoder in "predictive coding annealing" gives us some lessons: | |
Figure 3 shows us that we achieve our best convergence when interpolating between | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment