Skip to content

Instantly share code, notes, and snippets.

@ravishchawla
Created June 6, 2019 17:36
Show Gist options
  • Save ravishchawla/14611e64294f24709dbe32df930c717a to your computer and use it in GitHub Desktop.
Save ravishchawla/14611e64294f24709dbe32df930c717a to your computer and use it in GitHub Desktop.
'''Cleaning the *Profile* dataset'''
profile = profile.dropna(axis=0, subset=['gender', 'income']);
profile_gender = profile['gender'].str.get_dummies()
profile_gender.columns = ['gender_' + col for col in profile_gender.columns];
# Separate date attributes into year, month, and day, converting to integers.
profile_date = profile['became_member_on'];
profile_year = profile_date.apply(lambda d: str(d)).str[0:4].astype('int').rename('member_year');
profile_month = profile_date.apply(lambda d: str(d)).str[4:6].astype('int').rename('member_month');
profile_day = profile_date.apply(lambda d: str(d)).str[6:8].astype('int').rename('member_day');
profile = pd.concat([profile, profile_gender, profile_year, profile_month, profile_day], axis=1);
profile = profile.drop(['became_member_on', 'gender'], axis=1);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment