Created
November 16, 2011 04:36
-
-
Save greeness/1369247 to your computer and use it in GitHub Desktop.
dumbo running command line using cache file in hdfs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dumbo start demo_dumbo.py -hadoop /usr/lib/hadoop -input shares -output video_demos -outputformat text -files hdfs://ec2-xxx-xx-xx-xx.compute-1.amazonaws.com:8020/user/ubuntu/users/part-m-00000 | |
### piece of code in demo_dumbo.py | |
for line in file('part-m-00000'): | |
print line | |
# ---------------- | |
dumbo start demo_dumbo.py -hadoop /usr/lib/hadoop -input shares -output video_demos -outputformat text -files hdfs://ec2-xxx-xx-xx-xx.compute-1.amazonaws.com:8020/user/ubuntu/users | |
### piece of code in demo_dumbo.py | |
import glob | |
for filename in glob.glob('users/part*'): | |
for line in file(filename): | |
print line |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment