Created
March 9, 2013 07:28
-
-
Save masayang/5123315 to your computer and use it in GitHub Desktop.
MrJobを使ったMapReduce処理記述と実行
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python | |
# -*- coding: utf-8 -*- | |
from mrjob.job import MRJob | |
class MRWordCounter(MRJob): | |
def mapper(self, key, line): | |
for word in line.split(): | |
yield word, 1 | |
def reducer(self, word, occurrences): | |
yield word, sum(occurrences) | |
if __name__ == '__main__': | |
MRWordCounter.run() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#ローカルでの実行 | |
python wc.py < creativecommons.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#s3上ファイルを使ってEMR上で実行 | |
export AWS_ACCESS_KEY_ID=<your aws access key> | |
export AWS_SECRET_ACCESS_KEY=<your secret access key> | |
python wc.py -r emr s3://masayang-bootcamp/bootcamp4/EMRconsole/creativecommons.txt s3://masayang-bootcamp/bootcamp4/EMRconsole/creativecommons.txt -o s3://masayang-bootcamp/bootcamp4/EMRconsole/<your account> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment