Last active
September 19, 2017 13:35
-
-
Save kor01/6bfb7d65190488461972e65d120b99b1 to your computer and use it in GitHub Desktop.
[neural_nets_mind_bugs] #neural_nets #mind_bugs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in conventional notations: | |
1. layer input = the tensor before activation | |
2. layer output = the tensor after activation | |
3. next layer input = this layer output transformed by the edges | |
4. delta_layer = dLoss / dlayer_input | |
in graphical notations: | |
1. variables and tensors are equally treated in derivative computation | |
2. loss is already mean_reduced over batch (no further reduction in gradient application) | |
3. functional pattern: given the gradient of the node output from every subscriber, compute gradient of every input tensor | |
4. shape pattern: the shape of gradient equals to the shape of input tensor | |
best practice: | |
1. normalize data (unit variance 0 mean) before optimize | |
2. when things don't work, always lower the learning rate first | |
3. weight initialization variance = 1 / sqrt(input_size) truncated by 2-stddev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
operation with rank m output and rank n input has derivative (jacobian tensor) rank m + n | |
tensor value w.r.t a scalar loss L is the same shape | |
tensor x_1 = f(x_0), dL / d_x_0 = sum over e in x_1: dL / de * de / d_x0 | |
all operation level derivative computation implement this formula in specialized way |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
linear operation: Y = XW + b, x is shape (m,n), W is shape (n, k), y is shape (m, k) | |
YX = dY / dX is a (m, k, m, n) tensor, YX[i, j, :, :] is a matrix where ith row of it is jth column of W W[j, :] | |
therefore weighted sum over all elements in Y: YX * dL/dy[:, :, None, None].sum(axis=0).sum(axis=1) = sigma_i_j(dL / dyij * dyij / dX) = sigma_i_j [dyij * [0, 0, ... w^T_j .. 0]^T] = sigma_i [0, 0, sigma_j(dyij * w^T_kj), 0 ..0] = [..sigma_j(dyij * w^T_kj) ..] = dY * W^T | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment