Skip to content

Instantly share code, notes, and snippets.

@djinn
Last active August 14, 2020 06:41
Show Gist options
  • Save djinn/4d7c43497f5a41ba541bcc8d19ecaad9 to your computer and use it in GitHub Desktop.
Save djinn/4d7c43497f5a41ba541bcc8d19ecaad9 to your computer and use it in GitHub Desktop.
Movielens AWS Personalize Transform
import pandas, boto3
from sklearn.utils import shuffle
ratings = pandas.read_csv('ratings.csv')
ratings = shuffle(ratings)
ratings = ratings[ratings['rating']>3.6]
ratings = ratings.drop(columns='rating')
ratings.columns = ['USER_ID','ITEM_ID','TIMESTAMP']
ratings = ratings[:100000]
ratings.to_csv('ratings.processed.csv',index=False)
s3 = boto3.client('s3')
s3.upload_file('ratings.processed.csv','jsimon-ml20m','ratings.processed.csv')
{"type": "record",
"name": "Interactions",
"namespace": "com.amazonaws.personalize.schema",
"fields":[
{"name": "ITEM_ID", "type": "string"},
{"name": "USER_ID", "type": "string"},
{"name": "TIMESTAMP", "type": "long"}
],
"version": "1.0"}
#!/bin/sh
export ROLES_ARN="<PERSONALIZE ROLE ARN HERE -- don't run without changing"
export DATASET_GROUP_ARN=$(aws personalize create-dataset-group --name jsimon-ml20m-dataset-group | jq -r ".datasetGroupArn")
export SCHEMA_ARN=$(aws personalize create-schema --name jsimon-ml20m-schema --schema file:///`pwd`/schema.json | jq -r ".schemaArn")
export DATASET_ARN=$(aws personalize create-dataset --name jsimon-ml20m-dataset --schema-arn $SCHEMA_ARN \
--dataset-group-arn $DATASET_GROUP_ARN \
--dataset-type INTERACTIONS) | jq -r ".datasetArn"
aws personalize create-dataset-import-job --job-name jsimon-ml20m-job5 --role-arn $ROLES_ARN --dataset-arn $DATASET_ARN --data-source dataLocation=s3://personalizemovielensjsimon/movielens.csv
aws personalize create-solution --name jsimon-ml20m-solution \ --minTPS 10 --perform-auto-ml \ --dataset-group-arn $DATASET_GROUP_ARN \ --query 'solution.status'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment