-
-
Save kiwidamien/bcbe8e527a5f0cc9f28c4fe692f70cbc to your computer and use it in GitHub Desktop.
@Vini-002 StratifiedKFold doesn’t solve the imbalanced problem. What it does do is make sure that each of the folds are equally unbalanced. If the thyroid cases are only 10% of the population, you will have 10% of each fold have thyroid cases. This will still lead to some ML classifiers just stating the majority case.
there is no reason you could not use stratified sampling in addition to one of the techniques used here.
I understand that it doesn't solve the imbalance, but I thought it could be bad not to use StratifiedKFold as you could end up with some fold with no samples of the minority class.
That is absolutely fair. I probably wouldn’t add it into the main article (I think it is useful to focus on one problem) but can definitely see a section at the bottom for “going further” or “future improvements”.
I’ll take the advice onboard and add a section — thank you!!
Thanks for the great guide!
Is there a reason for not using StratifiedKFold?