Skip to content

Instantly share code, notes, and snippets.

@randyzwitch
Created July 31, 2013 15:18
Show Gist options
  • Save randyzwitch/6122918 to your computer and use it in GitHub Desktop.
Save randyzwitch/6122918 to your computer and use it in GitHub Desktop.
Python MapReduce for EMR
from mrjob.job import MRJob
class MRWordCounter(MRJob):
def mapper(self, english_dict, line):
english_dict = ['aal', 'aalii', 'aam', 'aani'...'zythum', 'zyzomys', 'zyzzogeton']
for word in english_dict:
if word in line:
yield word, 1
def reducer(self, word, occurrences):
yield word, sum(occurrences)
if __name__ == '__main__':
MRWordCounter.run()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment