Skip to content

Instantly share code, notes, and snippets.

@ravishchawla
Created February 25, 2019 22:21
Show Gist options
  • Save ravishchawla/99467dd69db866e3a16a8cf4d1bab7a4 to your computer and use it in GitHub Desktop.
Save ravishchawla/99467dd69db866e3a16a8cf4d1bab7a4 to your computer and use it in GitHub Desktop.
AirBnB post: Cleaning the data
miss_listings = listings.isnull().sum() / len(listings)
miss_gr_05 = listings.columns[miss_listings > 0.5]
listings = listings.drop(miss_gr_05, axis=1);
miss_gr_03 = listings.columns[miss_listings > 0.3]
miss_vals_03 = miss_listings > 0.3;
print([col + ' ' + str(miss_listings[col]) for col in miss_gr_03])
listings[['host_response_rate']] = listings['host_response_rate'].apply(lambda col: float(str(col).replace("%", "")))
listings['host_response_rate'] = listings['host_response_rate'].fillna(np.mean(listings['host_response_rate']));
listings['notes_exist'] = listings['notes'].isnull().astype('int');
listings = listings.drop('notes', axis=1)
listings['host_response_time'] = listings['host_response_time'].fillna('not provided')
listings = listings.drop('access', axis=1)
listings = listings.drop('transit', axis=1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment