Skip to content

Instantly share code, notes, and snippets.

@rohitdholakia
Created December 24, 2011 19:17
Show Gist options
  • Save rohitdholakia/1518133 to your computer and use it in GitHub Desktop.
Save rohitdholakia/1518133 to your computer and use it in GitHub Desktop.
A file to write all Netflix data into a MySQL Database using Python
'''This is to write all values in the netflix data into a file
1. It will be of the form ( movie id, user id , rating , timestamp)
'''
import sys,os
import MySQLdb
import datetime
def timestamp(time):
#We are assuming that the start date is 1990
old=datetime.date(1990,1,1)
newD=time.rstrip("\n").split("-")
currentDate=datetime.date(int(newD[0]),int(newD[1]),int(newD[2]))
return (currentDate-old).total_seconds()
#You gotta read all the files from the netflix dataset. That is about 17,770 movie files . Cool
allFiles=os.listdir(sys.argv[1]) #Give path to files here
finalFile=open("/home/crazyabtliv/netflixAll2.txt","w") #To write everything into this
for f in allFiles:
#Do for each file
print 'In file',f
i=0
movieId=0#Simple counter to know if its the first line . As each file starts with a movie id which need not be taken into consideration
ratings=open(os.path.join(sys.argv[1],f),'r')
for line in ratings:
#For each line in the file
if(i==0):
line=line.rstrip(":\n")
#print line
movieId=int(line)
i=i+1
continue
data=line.split(",") #Split line based on commas . It is of the form (user id,rating,timestamp)
t=timestamp(data[2])
finalFile.write(str(movieId)+","+data[0]+","+data[1]+","+str(timestamp(data[2]))+"\n")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment