- Input
-
$\vb{x}$ : input (vector of features) -
$\vb{y}$ : target output - $C(\vb{y}, f^L(W^L fL-1(WL-1 \cdots f^1(W^1 σ(x))\cdots)))$: loss function or “cost function”
-
$L$ : number of layers - $\mat{W^\mit{l}} = (w^ljk)$: the weights between layer
$l - 1$ and$l$ , where $w^ljk$ is the weight between the$k$ -th node in layer$l - 1$ and the$j$ -th node in layer$l$ -
$f^l$ or$σ^l$ : activation_function at layer$l$
-
- Intermediary functions
- weighted input: $$ \vb{z}^l ≡ \mat{W}^l \vb{a}l-1 + \vb{b}^l $$
- activation result at layer
$L$ : $$ \vb{a}^L ≡ σ(\vb{z}^L) = σ(\mat{W}^L \vb{a}L-1 + \vb{b}^L) $$ - the gradient of the weighted input of layer
$L$ :$$\vb{δ}^L ≡ \pdv{C}{\vb{z}^L} $$
- Parameters
-
$ε$ : target precision -
$γ$ : step size -
$\hat{ι}$ : max iterations
-
- An equation for the error in the output layer: \begin{equation} \label{eq1} \vb{δ}^l = ∇_a C ⋅ σ’(\vb{z}^l)} \end{equation}
- An equation for the error
$δ^l$ in terms of the error in the next layer: \begin{equation} \label{eq2}\tag{2} \begin{align*} &δ^l = ((\mat{w}l+1)^T δl+1) o σ’(z^l)
&i.e.\ &δ^l_j = ∑_k wl+1kj δl+1_k σ’(z^l_j) \end{align*} \end{equation} - An equation for the rate of change of the cost with respect to any bias in the network: \begin{equation} \label{eq3}\tag{3} \pdv{c}{b^l} = δ^l \end{equation}
- An equation for the rate of change of the cost with respect to any weight in the network:
\begin{equation} \tag{4}
\begin{align*}
\pdv{c}{\mat{w}} &= \vb{\vb{δ}}^l ⋅ (\vb{a}l-1)^T
i.e.\ \frac{∂ C}{∂ w^ljk} &= al-1_k δ^l_j \end{align*} \end{equation}
\begin{algorithm}
\SetAlgoLined
\KwData{$x,y,C,\vb{W}_0, \vb{b}_0, f^l; \; given: σ, ε, γ, \hat{ι}$}
\KwResult{\mat{W}^* \: s.t. \: C(x,y;\mat{W}^*) < ε \; or \: ∅}
Set the corresponding activation