Skip to content

Instantly share code, notes, and snippets.

@benmarwick
Last active December 15, 2015 18:49
Show Gist options
  • Save benmarwick/5306367 to your computer and use it in GitHub Desktop.
Save benmarwick/5306367 to your computer and use it in GitHub Desktop.
'How to work with Google n-gram data sets in R using MySQL' http://rpsychologist.com/how-to-work-with-google-ngram-data-sets-in-r-using-mysql/ Customizations for making this work on my setup (Windows 7 x64)
http://rpsychologist.com/how-to-work-with-google-ngram-data-sets-in-r-using-mysql/
# get ngram data (files a-z) from
http://books.google.com/ngrams/datasets
# get the a-z files into one big CSV file, use cmd in folder containing all the csv files
http://www.solveyourtech.com/merge-csv-files/
copy *.csv all-ngrams.csv
# get MySQL, install, install client libraries, fuss about to make a new database
http://stackoverflow.com/questions/5515745/create-a-new-database-with-mysql-workbench
# suited to ngrams V2 where structure is "ngram TAB year TAB match_count TAB volume_count NEWLINE"
CREATE TABLE `1_grams` (
`n_gram` text,
`year` int(11) DEFAULT NULL,
`match_count` int(11) DEFAULT NULL,
`volume_count` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
# get CSV file into MySQL (could take many hours)
LOAD DATA LOCAL INFILE 'C:/Users/marwick/Downloads/ngrams/all_ngrams.csv' INTO TABLE ngrams.1_grams;
# install RMySQL, not straightforward, read instructions and error messages very carefully
http://stackoverflow.com/questions/5223113/using-mysql-in-r-for-windows
http://biostat.mc.vanderbilt.edu/wiki/Main/RMySQL
# my Renviron.site looks like this
MYSQL_HOME=C:/PROGRA~1/MYSQL/MYSQLS~1.6/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment