Skip to content

Instantly share code, notes, and snippets.

@dpoulopoulos
Last active February 22, 2020 18:41
Show Gist options
  • Select an option

  • Save dpoulopoulos/6e8bd7ec28df0f1548689a2579cd8626 to your computer and use it in GitHub Desktop.

Select an option

Save dpoulopoulos/6e8bd7ec28df0f1548689a2579cd8626 to your computer and use it in GitHub Desktop.
Load the movielens 1m dataset ratings file.
# load the data
col_names = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings_df = pd.read_csv('/tmp/ratings.dat', delimiter='::', names=col_names, engine='python')
# transform users and movies to categorical features
ratings_df['user_id'] = ratings_df['user_id'].astype('category')
ratings_df['movie_id'] = ratings_df['movie_id'].astype('category')
# use the category codes to avoid creating separate vocabularies
ratings_df['user_code'] = ratings_df['user_id'].cat.codes.astype(int)
ratings_df['movie_code'] = ratings_df['movie_id'].cat.codes.astype(int)
ratings_df.head()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment