Skip to content

Instantly share code, notes, and snippets.

@adamchalmers
Created August 12, 2015 02:55
Show Gist options
  • Save adamchalmers/2161e0c3d6e14d68b096 to your computer and use it in GitHub Desktop.
Save adamchalmers/2161e0c3d6e14d68b096 to your computer and use it in GitHub Desktop.
For converting Movie Lens .dat file to Mongo-friendly .csv files.
# import the .csv files with this command:
# $ mongimport --type "csv" --headerline csv/users.csv
#
# This script expects a directory structure like this:
#
# dat_to_csv.py
# dat/
# movies.csv
# ratings.csv
# users.csv
# csv/
#
files = {
"movies" : "movie_id,title,genres",
"ratings": "user_id,movie_id,rating,ts",
"users" : "user_id,gender,age,occupation,zip"
}
DELIM = ","
for name in files:
with open("dat/%s.dat" % name) as src:
with open("csv/%s.csv" % name, "w") as dst:
headers = files[name] + "\n"
data = src.read().replace(DELIM,"`").replace("::",DELIM)
dst.write(headers + data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment