Skip to content

Instantly share code, notes, and snippets.

@FelixChop
Created March 26, 2020 11:11
Show Gist options
  • Save FelixChop/6ced9604b9b45d258d76440917f6b327 to your computer and use it in GitHub Desktop.
Save FelixChop/6ced9604b9b45d258d76440917f6b327 to your computer and use it in GitHub Desktop.
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder
one_hot_encoder_gender = OneHotEncoder(handle_unknown='ignore')
one_hot_encoder_gender.fit(train[['Sex']])
# For Embarked column, there are some missing values. We need to first fill them then encode them.
imputer_Embarked = SimpleImputer(strategy='most_frequent', add_indicator=True)
imputer_Embarked.fit(train[['Embarked']])
transformed_Embarked = \
pd.DataFrame(imputer_Embarked.transform(train[['Embarked']]),
columns=['Embarked', 'Embarked_missing'],
index=train.index)
train = train.drop(columns=['Embarked']).join(transformed_Embarked)
# Do not forget to fill missing values in the validation and holdout sets
ordinal_encoder_city = OrdinalEncoder()
ordinal_encoder_city.fit(train[['Embarked']])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment