Skip to content

Instantly share code, notes, and snippets.

@a-agmon
Created August 18, 2019 14:23
Show Gist options
  • Save a-agmon/b29294fdba82fc72ac735b65d4bea8f0 to your computer and use it in GitHub Desktop.
Save a-agmon/b29294fdba82fc72ac735b65d4bea8f0 to your computer and use it in GitHub Desktop.
#oversampling the minority - this will be resource intensive
from sklearn.model_selection import train_test_split
#divide the classes to training and test sets
x_train, x_test, y_train, y_test \
= train_test_split(df_raw.drop(['Bot'], axis=1), df_raw['Bot'], test_size = .2)
#oversample the minority
from imblearn.over_sampling import SMOTENC
smote_nc = SMOTENC(categorical_features=np.arange(3,14), sampling_strategy='minority')
x_train_up, y_train_up = smote_nc.fit_sample(x_train, y_train)
columns_set = x_train.columns
x_train = pd.DataFrame(x_train_up)
x_train.columns = columns_set
y_train = y_train_up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment