Skip to content

Instantly share code, notes, and snippets.

@SasquatchYuja
Last active November 25, 2020 14:54
Show Gist options
  • Save SasquatchYuja/22f26121be55cc3bca6a336bb6101c39 to your computer and use it in GitHub Desktop.
Save SasquatchYuja/22f26121be55cc3bca6a336bb6101c39 to your computer and use it in GitHub Desktop.
Differences between Precision, Recall, F-score, Dice, Accuracy, ROC AUC...

Differences between Precision, Recall, F-score, Dice, Accuracy, ROC AUC...

Made on HackMD, use it to get the correct display Available at:

What are the differences between the several metrics of binary classification performance evaluation ?

Link Towards Data Science See also p.197+ chap. 5.4.11+ : look for a file named like StatisticsMachineLearningPython.pdf

Analogy Project Bee Wings

On a photo of a bee's wing, detect intersection points (pixels coordinates) between the wing's veins, with an authorized error of 30px radius circle

  • pixels marked as 'intersection' are labeled 'positive'
  • all other pixels of the image are considered 'negative'
  • huge imbalance between positive and negative classes (in this example)

:::info TP : true positives FP : false positives TN : true negatives FN : false negatives :::

Precision :

:::warning aka.

  • Positive Predicted Value (PPV) :::
  • ratio of the correctly 'intersection' labeled by our program over all 'intersection' labeled by our program
  • how sure you are of your true positives
  • choose precision if you want to be more confident of your true positives :::info $$ precision = \frac{TP}{TP + FP} $$ ::: :::danger $precision \neq accuracy$ :::

Recall (of the positive class) :

:::warning aka.

  • Sensitivity (SEN)
  • True Positive Rate (TPR)
  • Hit rate :::
  • ratio of correctly 'intersection' labeled by our program over all that are really 'intersection'
  • equivalent of (but not the same thing as) specificity for positive values
  • how sure you are that you are not missing any positives
  • choose recall if the idea of false positives is far better than false negatives (ex: cancer / HIV detection...) :::info $$ recall = \frac{TP}{TP + FN} $$ ::: :::danger $recall = sensitivity$ BUT $recall \neq specificity$ :::

Specificity (SPC) :

:::warning aka.

  • True Negative Rate (TNR)
  • Recall of the negative class (denomination rarely used) :::
  • ratio of correctly 'not intersection' labeled by our program over all that are really 'not intersection'
  • equivalent of (but not the same thing as) recall for negative values
  • choose specificity if you want to cover all negatives, meaning you don’t want any false alarms (i.e false positives) :::info $$ specificity = \frac{TN}{TN + FP} $$ ::: :::danger $recall = sensitivity$ BUT $recall \neq specificity$ :::

F-score :

:::warning aka.

  • Dice score
  • F1-score
  • F-measure
  • F-value :::
  • considering precision and recall : harmonic mean of the precision and recall
  • F-score is best if there is some sort of balance between precision & recall in the system
  • oppositely F-score isn't so high if one measure is improved at the expense of the other. For example, if precision is 1 & recall is 0, or recall is 1 & precision is 0, F-score will be 0
  • F-score is a good measure if you have an uneven class distribution between positive and negative counts --> suited for the analogy of the Bee Wings project :::info $$ F_score = \frac{2 * (precision * recall)}{precision + recall} $$ ::: :::danger $$ F_score \neq \frac{precision + recall}{2} $$ :::

Accuracy (ACC) :

  • ratio of correctly labeled subjects over the whole pool of subjects
  • great measure but only when you have symmetric datasets (false negatives & false positives counts are close)
  • if the cost of false positives and false negatives are different then F1 is your savior :::info $$ accuracy = \frac{TP + TN}{TP + TN + FP + FN} $$ ::: :::danger $precision \neq accuracy$ :::

Balanced Accuracy (bACC):

  • arithmetic mean of sensitivity (recall) and specificity, or the average accuracy obtained on either class
  • avoids inflated performance estimates on imbalanced datasets :::info $$ bACC = \frac{recall + specificity}{2} $$ :::

ROC and AUC :

:::info ROC: Receiver Operating Characteristic AUC: Area Under Curve (under the ROC) ::: :::warning aka. curve of Recall (TPR) (y-axis) relative to the False Positive Rate (FPR) (x-axis) where : $$ FPR = \frac{FP}{FP + TN} $$ ::: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

  • ideal situation: curve sticking to y-axis and 'ceiling'
  • worst situation: identity function: full random
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment