Skip to content

Instantly share code, notes, and snippets.

@audhiaprilliant
Created December 24, 2020 03:00
Show Gist options
  • Select an option

  • Save audhiaprilliant/ab07316eff6ea9652bbd132e6fa31540 to your computer and use it in GitHub Desktop.

Select an option

Save audhiaprilliant/ab07316eff6ea9652bbd132e6fa31540 to your computer and use it in GitHub Desktop.
How to choose the optimal threshold for imbalanced classification
# Calculate the Youden's J statistic
youdenJ = tpr - fpr
# Find the optimal threshold
index = np.argmax(youdenJ)
thresholdOpt = round(thresholds[index], ndigits = 4)
youdenJOpt = round(gmean[index], ndigits = 4)
fprOpt = round(fpr[index], ndigits = 4)
tprOpt = round(tpr[index], ndigits = 4)
print('Best Threshold: {} with Youden J statistic: {}'.format(thresholdOpt, youdenJOpt))
print('FPR: {}, TPR: {}'.format(fprOpt, tprOpt))
# Create a data viz
plotnine.options.figure_size = (8, 4.8)
(
ggplot(data = df_fpr_tpr)+
geom_point(aes(x = 'FPR',
y = 'TPR'),
size = 0.4)+
# Best threshold
geom_point(aes(x = fprOpt,
y = tprOpt),
color = '#981220',
size = 4)+
geom_line(aes(x = 'FPR',
y = 'TPR'))+
# Annotate the text
geom_text(aes(x = fprOpt,
y = tprOpt),
label = 'Optimal threshold for \n negative class {}'.format(thredholdOpt),
nudge_x = 0.14,
nudge_y = -0.10,
size = 10,
fontstyle = 'italic')+
labs(title = 'ROC Curve')+
xlab('False Positive Rate (FPR)')+
ylab('True Positive Rate (TPR)')+
theme_minimal()
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment