Last active
June 2, 2024 17:04
-
-
Save Nikolaj-K/77d2aaf582282920767ce4e53b6ecb75 to your computer and use it in GitHub Desktop.
SoftMax: On derivations of its derivations, ∂σ/∂x
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Scirpt used in the video: | |
https://youtu.be/yx2xc9oHvkY | |
This video was a reaction to derivations such as: | |
re: https://community.deeplearning.ai/t/calculating-gradient-of-softmax-function/1897/3 | |
---- | |
For general $s\colon{\mathbb R}\to{\mathbb R}$, define the scaled vector ${\vec x}^s$: $i\mapsto \dfrac{s(x_i)}{\sum_{k=1}^n s(x_k)}$ | |
This is normalized in the sense that $\sum_{k=1}^n {\vec x}^s_i = 1$. | |
For positive $s$, also ${\vec x}^s_i\in(0, 1]$, akin to a probability. | |
We aso have the exchange property ${\vec x}^s_j = {\vec x}^s_i\cdot \dfrac{s(x_j)}{s(x_i)}$. | |
Relevant special case: $s=\exp$ we have $\dfrac{s(x_j)}{s(x_i)} = {\mathrm e}^{x_j-x_i}$. For $i=j$ this is of course $1$. | |
---- | |
$f\,\,=\,\,\dfrac{g}{h} \implies f'=\dfrac{g'\cdot h - h'\cdot g}{h^2}$ | |
Define the log-derivative | |
$Lf:=\log(f)'=\dfrac{f'}{f}$ | |
$Lf = (1 - f)\cdot Lg - f\cdot \dfrac{(h-g)'}{g}$ | |
Consider $f$ to be a rescaling as above, where in particular $g$ and the sum $h-g$ don't share variables. | |
Consider further and different derivatives $D_k$ for each possible dimension $x_k$. | |
$\bullet$ Case $D_ax_a=1$. Here $Lg = \dfrac{s'(x_a)}{s(x_a)}$ and $D_a(h-g)=0$. | |
$\bullet$ Case $D_bx_a=0$. Here $Lg = 0$ and $\dfrac{D_b(h-g)}{g} = \dfrac{s'(x_b)}{s(x_a)}$. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment