-
-
Save awjuliani/902fe41c3a9efe27299e72aee1b3158c to your computer and use it in GitHub Desktop.
To my understanding the Optimizer will try to minimize (target loss=0.0) the loss function, but in the example above the weights start
out at 1.0, causing the initial loss value to be already 0.0.loss = -(log(weight) * reward) = - (0.0 * reward) = - 0.0
What do I miss or get wrong?
f(x) = 0 doesn't mean f'(0) = 0, as long as the gradient of loss is not 0, eventually the weights will change.
In the example, as gradient(loss) = gradient(-log(weight)*reward)) = - reward * 1/weight (since d[lnx, x] = 1/x) and reward is const)
so gradient(loss) for (1) = -reward * 1/1 = -reward.
if reward == 1 (positive feedback), gradient = -1, so by gradient descend, it will subtract learning_rate * gradient, which is equivalent to adding learning_rate 0.001, so the new weight will become 1.001, giving it a little higher chance to be selected by argmax(weights). And so on.
Hi,
I have troubles to understand how the optimizer can tune the weights variable.
To my understanding the Optimizer will try to minimize (target loss=0.0) the loss function, but in the example above the weights start
out at 1.0, causing the initial loss value to be already 0.0.
loss = -(log(weight) * reward) = - (0.0 * reward) = - 0.0
What do I miss or get wrong?
thx,
Manuel