Created
February 22, 2018 22:49
-
-
Save gnurio/14c1fcff51e68daa13275ec84a4d35dd to your computer and use it in GitHub Desktop.
Re-coding a categorical field into one-hot vectors
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
from sklearn.preprocessing import LabelEncoder,OneHotEncoder | |
def decode_encode(colname): | |
''' | |
(str) -> (DataFrame) | |
Returns a Data Frame with the column given to it transformed into a One-hot encoded set of columns | |
''' | |
label_encode = LabelEncoder() | |
X = data[colname] | |
X_new = label_encode.fit_transform(X) | |
X_new = X_new.reshape(-1,1) | |
encode = OneHotEncoder() | |
X_new = pd.DataFrame(encode.fit_transform(X_new).toarray(),columns=label_encode.classes_) | |
data_dropped = data.drop(colname,axis=1) | |
mergelist = [X_new,data_dropped] | |
data_onehot = pd.concat(mergelist,axis=1) | |
return data_onehot |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment