Made on HackMD, use it to get the correct display Available at:
What are the differences between the several metrics of binary classification performance evaluation ?
Link Towards Data Science See also p.197+ chap. 5.4.11+ : look for a file named like StatisticsMachineLearningPython.pdf
On a photo of a bee's wing, detect intersection points (pixels coordinates) between the wing's veins, with an authorized error of 30px radius circle
- pixels marked as 'intersection' are labeled 'positive'
- all other pixels of the image are considered 'negative'
- huge imbalance between positive and negative classes (in this example)
:::info TP : true positives FP : false positives TN : true negatives FN : false negatives :::
:::warning aka.
- Positive Predicted Value (PPV) :::
- ratio of the correctly 'intersection' labeled by our program over all 'intersection' labeled by our program
- how sure you are of your true positives
- choose precision if you want to be more confident of your true positives
:::info
$$
precision = \frac{TP}{TP + FP}
$$
:::
:::danger
$precision \neq accuracy$ :::
:::warning aka.
- Sensitivity (SEN)
- True Positive Rate (TPR)
- Hit rate :::
- ratio of correctly 'intersection' labeled by our program over all that are really 'intersection'
- equivalent of (but not the same thing as) specificity for positive values
- how sure you are that you are not missing any positives
- choose recall if the idea of false positives is far better than false negatives (ex: cancer / HIV detection...)
:::info
$$
recall = \frac{TP}{TP + FN}
$$
:::
:::danger
$recall = sensitivity$ BUT$recall \neq specificity$ :::
:::warning aka.
- True Negative Rate (TNR)
- Recall of the negative class (denomination rarely used) :::
- ratio of correctly 'not intersection' labeled by our program over all that are really 'not intersection'
- equivalent of (but not the same thing as) recall for negative values
- choose specificity if you want to cover all negatives, meaning you don’t want any false alarms (i.e false positives)
:::info
$$
specificity = \frac{TN}{TN + FP}
$$
:::
:::danger
$recall = sensitivity$ BUT$recall \neq specificity$ :::
:::warning aka.
- Dice score
- F1-score
- F-measure
- F-value :::
- considering precision and recall : harmonic mean of the precision and recall
- F-score is best if there is some sort of balance between precision & recall in the system
- oppositely F-score isn't so high if one measure is improved at the expense of the other. For example, if precision is 1 & recall is 0, or recall is 1 & precision is 0, F-score will be 0
- F-score is a good measure if you have an uneven class distribution between positive and negative counts --> suited for the analogy of the Bee Wings project :::info $$ F_score = \frac{2 * (precision * recall)}{precision + recall} $$ ::: :::danger $$ F_score \neq \frac{precision + recall}{2} $$ :::
- ratio of correctly labeled subjects over the whole pool of subjects
- great measure but only when you have symmetric datasets (false negatives & false positives counts are close)
- if the cost of false positives and false negatives are different then F1 is your savior
:::info
$$
accuracy = \frac{TP + TN}{TP + TN + FP + FN}
$$
:::
:::danger
$precision \neq accuracy$ :::
- arithmetic mean of sensitivity (recall) and specificity, or the average accuracy obtained on either class
- avoids inflated performance estimates on imbalanced datasets :::info $$ bACC = \frac{recall + specificity}{2} $$ :::
:::info ROC: Receiver Operating Characteristic AUC: Area Under Curve (under the ROC) ::: :::warning aka. curve of Recall (TPR) (y-axis) relative to the False Positive Rate (FPR) (x-axis) where : $$ FPR = \frac{FP}{FP + TN} $$ ::: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
- ideal situation: curve sticking to y-axis and 'ceiling'
- worst situation: identity function: full random