Created
May 9, 2019 07:00
-
-
Save kiwidamien/bcbe8e527a5f0cc9f28c4fe692f70cbc to your computer and use it in GitHub Desktop.
Example of cross-validation with unbalanced data
Author
I understand that it doesn't solve the imbalance, but I thought it could be bad not to use StratifiedKFold as you could end up with some fold with no samples of the minority class.
Author
That is absolutely fair. I probably wouldn’t add it into the main article (I think it is useful to focus on one problem) but can definitely see a section at the bottom for “going further” or “future improvements”.
I’ll take the advice onboard and add a section — thank you!!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@Vini-002 StratifiedKFold doesn’t solve the imbalanced problem. What it does do is make sure that each of the folds are equally unbalanced. If the thyroid cases are only 10% of the population, you will have 10% of each fold have thyroid cases. This will still lead to some ML classifiers just stating the majority case.
there is no reason you could not use stratified sampling in addition to one of the techniques used here.