When imputing using the sklearn.impute.SimpleImputer
with the option strategy="most_frequent"
calling the .fit()
method takes an obserdly long time. This happens scipy.stats.mode
is rediculously inefficient for string variables. For example, imputing a single feature with half a million values takes ~15 minutes without the shim and with this shim it takes less than one milliseond.
Usage:
import SimpleImputerShim # thats it!