This gist aims to explore interesting scenarios that may be encountered while training machine learning models.
Let's imagine a scenario where the validation accuracy and loss both begin to increase. Intuitively, it seems like this scenario should not happen, since loss and accuracy seem like they would have an inverse relationship. Let's explore this a bit in the context of a binary classification problem in which a model parameterizes a Bernoulli distribution (i.e., it outputs the "probability" of the true class) and is trained with the associated negative log likelihood as the loss function (i.e., the "logistic loss" == "log loss" == "binary cross entropy").
Imagine that when the model is predicting a probability of 0.99 for a "true" class, the model is both correct (assuming a decision threshold of 0.5) and has a low loss since it can't do much better for that example. Now, imagine that the model