Created
August 29, 2012 21:28
-
-
Save Quantisan/3519212 to your computer and use it in GitHub Desktop.
Impatient part 4
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ cat output/wc/part-00000 | |
air 1 | |
area 4 | |
australia 1 | |
broken 1 | |
california's 1 | |
cause 1 | |
cloudcover 1 | |
death 1 | |
deserts 1 | |
downwind 1 | |
dry 3 | |
dvd 1 | |
effect 1 | |
known 1 | |
land 2 | |
lee 2 | |
leeward 2 | |
less 1 | |
lies 1 | |
mountain 3 | |
mountainous 1 | |
primary 1 | |
produces 1 | |
rain 5 | |
ranges 1 | |
secrets 1 | |
shadow 4 | |
sinking 1 | |
such 1 | |
valley 1 | |
women 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ hadoop jar target/impatient.jar data/rain.txt output/wc data/en.stop | |
12/08/29 22:27:35 INFO util.HadoopUtil: resolving application jar from found main method on: impatient.core | |
12/08/29 22:27:35 INFO planner.HadoopPlanner: using application jar: /Users/paullam/Dropbox/Projects/Impatient/part4/target/impatient.jar | |
12/08/29 22:27:35 INFO property.AppProps: using app.id: 4041A47742A4D7A9AAD67DA6E7807E38 | |
2012-08-29 22:27:35.645 java[65356:1903] Unable to load realm info from SCDynamicStore | |
12/08/29 22:27:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
12/08/29 22:27:35 WARN snappy.LoadSnappy: Snappy native library not loaded | |
12/08/29 22:27:35 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/08/29 22:27:35 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/08/29 22:27:36 INFO util.Version: Concurrent, Inc - Cascading 2.0.0 | |
12/08/29 22:27:36 INFO flow.Flow: [] starting | |
12/08/29 22:27:36 INFO flow.Flow: [] source: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"] | |
12/08/29 22:27:36 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"] | |
12/08/29 22:27:36 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"] | |
12/08/29 22:27:36 INFO flow.Flow: [] parallel execution is enabled: false | |
12/08/29 22:27:36 INFO flow.Flow: [] starting jobs: 2 | |
12/08/29 22:27:36 INFO flow.Flow: [] allocating threads: 1 | |
12/08/29 22:27:36 INFO flow.FlowStep: [] starting step: (1/2) | |
12/08/29 22:27:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= | |
12/08/29 22:27:36 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/08/29 22:27:36 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/08/29 22:27:36 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001 | |
12/08/29 22:27:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:36 INFO io.MultiInputSplit: current split input path: file:/Users/paullam/Dropbox/Projects/Impatient/part4/data/en.stop | |
12/08/29 22:27:36 INFO mapred.MapTask: numReduceTasks: 1 | |
12/08/29 22:27:36 INFO mapred.MapTask: io.sort.mb = 100 | |
12/08/29 22:27:37 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
12/08/29 22:27:37 INFO mapred.MapTask: record buffer = 262144/327680 | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"] | |
12/08/29 22:27:37 INFO hadoop.FlowMapper: sinking to: CoGroup(b562961e-f24a-47d5-b2c0-bd84754bfc94*49c53bad-a009-4937-915e-df900cf5804c)[by:b562961e-f24a-47d5-b2c0-bd84754bfc94:[{1}:'?word']49c53bad-a009-4937-915e-df900cf5804c:[{1}:'?word']] | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO mapred.MapTask: Starting flush of map output | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO mapred.MapTask: Finished spill 0 | |
12/08/29 22:27:37 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting | |
12/08/29 22:27:37 INFO mapred.LocalJobRunner: file:/Users/paullam/Dropbox/Projects/Impatient/part4/data/en.stop:0+544 | |
12/08/29 22:27:37 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO io.MultiInputSplit: current split input path: file:/Users/paullam/Dropbox/Projects/Impatient/part4/data/rain.txt | |
12/08/29 22:27:37 INFO mapred.MapTask: numReduceTasks: 1 | |
12/08/29 22:27:37 INFO mapred.MapTask: io.sort.mb = 100 | |
12/08/29 22:27:37 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
12/08/29 22:27:37 INFO mapred.MapTask: record buffer = 262144/327680 | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"] | |
12/08/29 22:27:37 INFO hadoop.FlowMapper: sinking to: CoGroup(b562961e-f24a-47d5-b2c0-bd84754bfc94*49c53bad-a009-4937-915e-df900cf5804c)[by:b562961e-f24a-47d5-b2c0-bd84754bfc94:[{1}:'?word']49c53bad-a009-4937-915e-df900cf5804c:[{1}:'?word']] | |
12/08/29 22:27:37 INFO mapred.MapTask: Starting flush of map output | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO mapred.MapTask: Finished spill 0 | |
12/08/29 22:27:37 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting | |
12/08/29 22:27:37 INFO mapred.LocalJobRunner: file:/Users/paullam/Dropbox/Projects/Impatient/part4/data/rain.txt:0+510 | |
12/08/29 22:27:37 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done. | |
12/08/29 22:27:37 INFO mapred.LocalJobRunner: | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO mapred.Merger: Merging 2 sorted segments | |
12/08/29 22:27:37 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 3523 bytes | |
12/08/29 22:27:37 INFO mapred.LocalJobRunner: | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO hadoop.FlowReducer: sourcing from: CoGroup(b562961e-f24a-47d5-b2c0-bd84754bfc94*49c53bad-a009-4937-915e-df900cf5804c)[by:b562961e-f24a-47d5-b2c0-bd84754bfc94:[{1}:'?word']49c53bad-a009-4937-915e-df900cf5804c:[{1}:'?word']] | |
12/08/29 22:27:37 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?word', '!__gen14']]"][3b95c049-6903-457a-8d45-1/28373/] | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:37 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:41 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec | |
12/08/29 22:27:41 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec | |
12/08/29 22:27:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:41 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting | |
12/08/29 22:27:41 INFO mapred.LocalJobRunner: | |
12/08/29 22:27:41 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now | |
12/08/29 22:27:41 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/tmp/hadoop-paullam/3b95c049_6903_457a_8d45_1_28373_415A6EA922C39EDFEB248B1764882B79 | |
12/08/29 22:27:41 INFO mapred.LocalJobRunner: reduce > reduce | |
12/08/29 22:27:41 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done. | |
12/08/29 22:27:41 INFO flow.FlowStep: [] starting step: (2/2) output/wc | |
12/08/29 22:27:41 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized | |
12/08/29 22:27:41 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/08/29 22:27:42 INFO flow.FlowStep: [] submitted hadoop job: job_local_0002 | |
12/08/29 22:27:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:42 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-paullam/3b95c049_6903_457a_8d45_1_28373_415A6EA922C39EDFEB248B1764882B79/part-00000 | |
12/08/29 22:27:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:42 INFO mapred.MapTask: numReduceTasks: 1 | |
12/08/29 22:27:42 INFO mapred.MapTask: io.sort.mb = 100 | |
12/08/29 22:27:42 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
12/08/29 22:27:42 INFO mapred.MapTask: record buffer = 262144/327680 | |
12/08/29 22:27:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:42 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?word', '!__gen14']]"][3b95c049-6903-457a-8d45-1/28373/] | |
12/08/29 22:27:42 INFO hadoop.FlowMapper: sinking to: GroupBy(3b95c049-6903-457a-8d45-1d2b0743a1e0)[by:[{1}:'?word']] | |
12/08/29 22:27:42 INFO mapred.MapTask: Starting flush of map output | |
12/08/29 22:27:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:42 INFO mapred.MapTask: Finished spill 0 | |
12/08/29 22:27:42 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting | |
12/08/29 22:27:42 INFO mapred.LocalJobRunner: file:/tmp/hadoop-paullam/3b95c049_6903_457a_8d45_1_28373_415A6EA922C39EDFEB248B1764882B79/part-00000:0+784 | |
12/08/29 22:27:42 INFO mapred.Task: Task 'attempt_local_0002_m_000000_0' done. | |
12/08/29 22:27:42 INFO mapred.LocalJobRunner: | |
12/08/29 22:27:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:42 INFO mapred.Merger: Merging 1 sorted segments | |
12/08/29 22:27:42 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 561 bytes | |
12/08/29 22:27:42 INFO mapred.LocalJobRunner: | |
12/08/29 22:27:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:42 INFO hadoop.FlowReducer: sourcing from: GroupBy(3b95c049-6903-457a-8d45-1d2b0743a1e0)[by:[{1}:'?word']] | |
12/08/29 22:27:42 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"] | |
12/08/29 22:27:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/08/29 22:27:42 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting | |
12/08/29 22:27:42 INFO mapred.LocalJobRunner: | |
12/08/29 22:27:42 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now | |
12/08/29 22:27:42 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/Users/paullam/Dropbox/Projects/Impatient/part4/output/wc | |
12/08/29 22:27:42 INFO mapred.LocalJobRunner: reduce > reduce | |
12/08/29 22:27:42 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done. | |
12/08/29 22:27:42 INFO util.Hadoop18TapUtil: deleting temp path output/wc/_temporary |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment