Last active
February 22, 2020 18:41
-
-
Save dpoulopoulos/6e8bd7ec28df0f1548689a2579cd8626 to your computer and use it in GitHub Desktop.
Load the movielens 1m dataset ratings file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # load the data | |
| col_names = ['user_id', 'movie_id', 'rating', 'timestamp'] | |
| ratings_df = pd.read_csv('/tmp/ratings.dat', delimiter='::', names=col_names, engine='python') | |
| # transform users and movies to categorical features | |
| ratings_df['user_id'] = ratings_df['user_id'].astype('category') | |
| ratings_df['movie_id'] = ratings_df['movie_id'].astype('category') | |
| # use the category codes to avoid creating separate vocabularies | |
| ratings_df['user_code'] = ratings_df['user_id'].cat.codes.astype(int) | |
| ratings_df['movie_code'] = ratings_df['movie_id'].cat.codes.astype(int) | |
| ratings_df.head() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment