Last active
August 14, 2020 06:41
-
-
Save djinn/4d7c43497f5a41ba541bcc8d19ecaad9 to your computer and use it in GitHub Desktop.
Movielens AWS Personalize Transform
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas, boto3 | |
from sklearn.utils import shuffle | |
ratings = pandas.read_csv('ratings.csv') | |
ratings = shuffle(ratings) | |
ratings = ratings[ratings['rating']>3.6] | |
ratings = ratings.drop(columns='rating') | |
ratings.columns = ['USER_ID','ITEM_ID','TIMESTAMP'] | |
ratings = ratings[:100000] | |
ratings.to_csv('ratings.processed.csv',index=False) | |
s3 = boto3.client('s3') | |
s3.upload_file('ratings.processed.csv','jsimon-ml20m','ratings.processed.csv') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"type": "record", | |
"name": "Interactions", | |
"namespace": "com.amazonaws.personalize.schema", | |
"fields":[ | |
{"name": "ITEM_ID", "type": "string"}, | |
{"name": "USER_ID", "type": "string"}, | |
{"name": "TIMESTAMP", "type": "long"} | |
], | |
"version": "1.0"} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
export ROLES_ARN="<PERSONALIZE ROLE ARN HERE -- don't run without changing" | |
export DATASET_GROUP_ARN=$(aws personalize create-dataset-group --name jsimon-ml20m-dataset-group | jq -r ".datasetGroupArn") | |
export SCHEMA_ARN=$(aws personalize create-schema --name jsimon-ml20m-schema --schema file:///`pwd`/schema.json | jq -r ".schemaArn") | |
export DATASET_ARN=$(aws personalize create-dataset --name jsimon-ml20m-dataset --schema-arn $SCHEMA_ARN \ | |
--dataset-group-arn $DATASET_GROUP_ARN \ | |
--dataset-type INTERACTIONS) | jq -r ".datasetArn" | |
aws personalize create-dataset-import-job --job-name jsimon-ml20m-job5 --role-arn $ROLES_ARN --dataset-arn $DATASET_ARN --data-source dataLocation=s3://personalizemovielensjsimon/movielens.csv | |
aws personalize create-solution --name jsimon-ml20m-solution \ --minTPS 10 --perform-auto-ml \ --dataset-group-arn $DATASET_GROUP_ARN \ --query 'solution.status' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment