Skip to content

Instantly share code, notes, and snippets.

/munge.py Secret

Created April 25, 2015 06:05
Show Gist options
  • Save anonymous/9d00458e5ce1cbc3a2f4 to your computer and use it in GitHub Desktop.
Save anonymous/9d00458e5ce1cbc3a2f4 to your computer and use it in GitHub Desktop.
IMDB Ratings DB Munging
# First, get a clean version of just the ratings data
ratings = open('ratings.list').read()
_, ratings = ratings.split('MOVIE RATINGS REPORT\n\n')
ratings, _ = ratings.split('\n\n------------------------------------------------------------------------------')
open('ratings.clean.list', 'w').write(ratings)
# Now play
import pandas as pd
titles, rating_data = ratings.split('\n', 1)
titles = titles.split()
rating_data_lines = rating_data.splitlines()
rating_data_split = [re.split(r"\s+", l, maxsplit=len(titles)-1) for l in rating_data_lines]
ratings = pd.DataFrame(rating_data_split, columns=titles).convert_objects(convert_numeric=True)
ratings = pd.read_csv('ratings.clean.list', delimiter=r"\s\s+")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment