Skip to content

Instantly share code, notes, and snippets.

@tlkahn
Created January 28, 2020 16:37
Show Gist options
  • Save tlkahn/41ef27b934290427d4d03c64e9164974 to your computer and use it in GitHub Desktop.
Save tlkahn/41ef27b934290427d4d03c64e9164974 to your computer and use it in GitHub Desktop.

Back propagation algorithm

Notations

  • Input
    • $\vb{x}$: input (vector of features)
    • $\vb{y}$: target output
    • $C(\vb{y}, f^L(W^L fL-1(WL-1 \cdots f^1(W^1 σ(x))\cdots)))$: loss function or “cost function”
    • $L$: number of layers
    • $\mat{W^\mit{l}} = (w^ljk)$: the weights between layer $l - 1$ and $l$, where $w^ljk$ is the weight between the $k$ -th node in layer $l - 1$ and the $j$ -th node in layer $l$
    • $f^l$ or $σ^l$ : activation_function at layer $l$
  • Intermediary functions
    • weighted input: $$ \vb{z}^l ≡ \mat{W}^l \vb{a}l-1 + \vb{b}^l $$
    • activation result at layer $L$: $$ \vb{a}^L ≡ σ(\vb{z}^L) = σ(\mat{W}^L \vb{a}L-1 + \vb{b}^L) $$
    • the gradient of the weighted input of layer $L$: $$\vb{δ}^L ≡ \pdv{C}{\vb{z}^L} $$
  • Parameters
    • $ε$: target precision
    • $γ$: step size
    • $\hat{ι}$: max iterations

Basic equations

  • An equation for the error in the output layer: \begin{equation} \label{eq1} \vb{δ}^l = ∇_a C ⋅ σ’(\vb{z}^l)} \end{equation}
  • An equation for the error $δ^l$ in terms of the error in the next layer: \begin{equation} \label{eq2}\tag{2} \begin{align*} &δ^l = ((\mat{w}l+1)^T δl+1) o σ’(z^l)
    &i.e.\ &δ^l_j = ∑_k wl+1kj δl+1_k σ’(z^l_j) \end{align*} \end{equation}
  • An equation for the rate of change of the cost with respect to any bias in the network: \begin{equation} \label{eq3}\tag{3} \pdv{c}{b^l} = δ^l \end{equation}
  • An equation for the rate of change of the cost with respect to any weight in the network: \begin{equation} \tag{4} \begin{align*} \pdv{c}{\mat{w}} &= \vb{\vb{δ}}^l ⋅ (\vb{a}l-1)^T
    i.e.\ \frac{∂ C}{∂ w^ljk} &= al-1_k δ^l_j \end{align*} \end{equation}

Algorithm

\begin{algorithm} \SetAlgoLined \KwData{$x,y,C,\vb{W}_0, \vb{b}_0, f^l; \; given: σ, ε, γ, \hat{ι}$} \KwResult{\mat{W}^* \: s.t. \: C(x,y;\mat{W}^*) < ε \; or \: ∅} Set the corresponding activation $a^1 = σ(x)$ for the input layer\; $\mit{i} ← 0$\; successFlag $←$ \False\; \While{$\mit{i} < \hat{ι} $}{ \tcp{Feed forward} \For {$l ← 2,3,\ldots,\mathrm{L}$}{ compute $\vb{z}^l ≡ \mat{W}^l \vb{a}l-1 + \vb{b}^l$ \; compute $ \vb{a}^L ≡ σ(\vb{z}^L) $ \; } \tcp{Compute cost function} compute: $\mit{c} = C(\vb{a}, y)$\; \If{${c} < ε}{ successFlag $←$ \True\; \Break\; }\Else{ \tcp{Output error at last layer} compute: $\vb{δ}^L = \grad{\vb{a}}{C} o σ’(\vb{z}^L)$\; \tcp{Backpropagate the error} \For{$l ← L-1, L-2, \ldots, 2$}{ compute: $\vb{δ}^l = ((\mat{w}l+1)^T \vb{δ}l+1) o σ’(\vb{z}^l)\;$\; \tcp{Compute weight and bias gradient} compute: $\pdv{C}{\mat{W^l}} &= \vb{δ}^l ⋅ (\vb{a}l-1)^T$ and $\pdv{C}{\vb{b}^l} &amp;= \vb{δ}^l$\; \tcp{update $\mat{W^\mit{l}}, \vb{b}^l$ } compute: $ \mat{W}\mit{l} ← \mat{W}\mit{l} + γ ⋅ \pdv{C}{\mat{W}^\mit{l}} $ and $ \vb{b}^l ← \vb{b}^l + γ \pdv{C}{\vb{b}^l} $\; } } $ \mit{i} ← \mit{i} + 1 $\; } \If{successFlag}{ \Return{$(\mat{W}^1, \ldots, \mat{W}^L)$ as $\mat{W}^∗$} } \Else { \Return{$∅$} } \caption{Neural network backpropagation algorithm} \end{algorithm}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment