Skip to content

Instantly share code, notes, and snippets.

@Quantisan
Created October 6, 2012 14:04
Show Gist options
  • Save Quantisan/3845007 to your computer and use it in GitHub Desktop.
Save Quantisan/3845007 to your computer and use it in GitHub Desktop.
Impatient part 5
$ hadoop jar ./target/impatient.jar data/rain.txt output/wc data/en.stop output/tfidf
2012-10-06 15:00:25.269 java[1097:1903] Unable to load realm info from SCDynamicStore
12/10/06 15:00:25 INFO util.HadoopUtil: resolving application jar from found main method on: impatient.core
12/10/06 15:00:25 INFO planner.HadoopPlanner: using application jar: /Users/paullam/Dropbox/Projects/Impatient/part5/./target/impatient.jar
12/10/06 15:00:25 INFO property.AppProps: using app.id: 63CBE2FEBFE8177789403D9EA7C81366
12/10/06 15:00:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/10/06 15:00:25 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/06 15:00:25 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:00:25 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:00:25 INFO util.Version: Concurrent, Inc - Cascading 2.0.0
12/10/06 15:00:25 INFO flow.Flow: [] starting
12/10/06 15:00:25 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/10/06 15:00:25 INFO flow.Flow: [] source: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"]
12/10/06 15:00:25 INFO flow.Flow: [] sink: Hfs["SequenceFile[[UNKNOWN]->['?n-docs']]"]["/tmp/cascalog_reserved/42de5c85-198a-490e-af5e-dde2f7f92cb5/fa9e956a-cc44-406c-b98e-033a99cd91ed"]"]
12/10/06 15:00:25 INFO flow.Flow: [] parallel execution is enabled: false
12/10/06 15:00:25 INFO flow.Flow: [] starting jobs: 3
12/10/06 15:00:25 INFO flow.Flow: [] allocating threads: 1
12/10/06 15:00:25 INFO flow.FlowStep: [] starting step: (1/3)
12/10/06 15:00:26 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:00:26 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:00:26 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001
12/10/06 15:00:26 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:00:26 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:26 INFO io.MultiInputSplit: current split input path: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/en.stop
12/10/06 15:00:26 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:00:26 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:00:26 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:00:26 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:00:26 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:26 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:26 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"]
12/10/06 15:00:26 INFO hadoop.FlowMapper: sinking to: CoGroup(6bc36ec3-9880-40cf-a59d-c7049ed73325*b3c27710-7245-46f5-97ce-17f85052390c)[by:6bc36ec3-9880-40cf-a59d-c7049ed73325:[{1}:'?word']b3c27710-7245-46f5-97ce-17f85052390c:[{1}:'?word']]
12/10/06 15:00:26 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:27 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:00:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:27 INFO mapred.MapTask: Finished spill 0
12/10/06 15:00:27 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/10/06 15:00:29 INFO mapred.LocalJobRunner: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/en.stop:0+544
12/10/06 15:00:29 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/10/06 15:00:29 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:00:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:29 INFO io.MultiInputSplit: current split input path: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/rain.txt
12/10/06 15:00:29 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:00:29 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:00:29 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:00:29 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:00:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:29 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/10/06 15:00:29 INFO hadoop.FlowMapper: sinking to: CoGroup(6bc36ec3-9880-40cf-a59d-c7049ed73325*b3c27710-7245-46f5-97ce-17f85052390c)[by:6bc36ec3-9880-40cf-a59d-c7049ed73325:[{1}:'?word']b3c27710-7245-46f5-97ce-17f85052390c:[{1}:'?word']]
12/10/06 15:00:29 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:00:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:29 INFO mapred.MapTask: Finished spill 0
12/10/06 15:00:29 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
12/10/06 15:00:32 INFO mapred.LocalJobRunner: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/rain.txt:0+510
12/10/06 15:00:32 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
12/10/06 15:00:32 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:00:32 INFO mapred.LocalJobRunner:
12/10/06 15:00:32 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:32 INFO mapred.Merger: Merging 2 sorted segments
12/10/06 15:00:32 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 4413 bytes
12/10/06 15:00:32 INFO mapred.LocalJobRunner:
12/10/06 15:00:32 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:32 INFO hadoop.FlowReducer: sourcing from: CoGroup(6bc36ec3-9880-40cf-a59d-c7049ed73325*b3c27710-7245-46f5-97ce-17f85052390c)[by:6bc36ec3-9880-40cf-a59d-c7049ed73325:[{1}:'?word']b3c27710-7245-46f5-97ce-17f85052390c:[{1}:'?word']]
12/10/06 15:00:32 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?__gen20']]"][fad0150c-ff59-4998-a67e-7/1393/]
12/10/06 15:00:32 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:32 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:32 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:35 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec
12/10/06 15:00:35 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec
12/10/06 15:00:35 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:35 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:35 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/10/06 15:00:35 INFO mapred.LocalJobRunner:
12/10/06 15:00:35 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/10/06 15:00:35 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/tmp/hadoop-paullam/fad0150c_ff59_4998_a67e_7_1393_D68D8CA0AC89398BCDC4C1131AB444EB
12/10/06 15:00:38 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:00:38 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:00:38 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/10/06 15:00:38 INFO flow.FlowStep: [] starting step: (2/3)
12/10/06 15:00:38 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:00:38 INFO flow.FlowStep: [] submitted hadoop job: job_local_0002
12/10/06 15:00:38 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:00:38 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:38 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-paullam/fad0150c_ff59_4998_a67e_7_1393_D68D8CA0AC89398BCDC4C1131AB444EB/part-00000
12/10/06 15:00:38 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:38 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:00:38 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:00:38 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:00:38 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:00:38 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:38 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:38 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?__gen20']]"][fad0150c-ff59-4998-a67e-7/1393/]
12/10/06 15:00:38 INFO hadoop.FlowMapper: sinking to: GroupBy(fad0150c-ff59-4998-a67e-73d49a228672)[by:[{1}:'?__gen20']]
12/10/06 15:00:38 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:00:38 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:38 INFO mapred.MapTask: Finished spill 0
12/10/06 15:00:38 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
12/10/06 15:00:41 INFO mapred.LocalJobRunner: file:/tmp/hadoop-paullam/fad0150c_ff59_4998_a67e_7_1393_D68D8CA0AC89398BCDC4C1131AB444EB/part-00000:0+1030
12/10/06 15:00:41 INFO mapred.Task: Task 'attempt_local_0002_m_000000_0' done.
12/10/06 15:00:41 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:00:41 INFO mapred.LocalJobRunner:
12/10/06 15:00:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:41 INFO mapred.Merger: Merging 1 sorted segments
12/10/06 15:00:41 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 722 bytes
12/10/06 15:00:41 INFO mapred.LocalJobRunner:
12/10/06 15:00:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:41 INFO hadoop.FlowReducer: sourcing from: GroupBy(fad0150c-ff59-4998-a67e-73d49a228672)[by:[{1}:'?__gen20']]
12/10/06 15:00:41 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['!__gen21', '!__gen22']]"][63478faa-5f34-458b-b38e-0/40644/]
12/10/06 15:00:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:41 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
12/10/06 15:00:41 INFO mapred.LocalJobRunner:
12/10/06 15:00:41 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now
12/10/06 15:00:41 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/hadoop-paullam/63478faa_5f34_458b_b38e_0_40644_E59B01C65A9E557906EEEAEDA9C22640
12/10/06 15:00:44 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:00:44 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done.
12/10/06 15:00:44 INFO flow.FlowStep: [] starting step: (3/3) ...44-406c-b98e-033a99cd91ed
12/10/06 15:00:44 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:00:44 INFO flow.FlowStep: [] submitted hadoop job: job_local_0003
12/10/06 15:00:44 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:00:44 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:44 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-paullam/63478faa_5f34_458b_b38e_0_40644_E59B01C65A9E557906EEEAEDA9C22640/part-00000
12/10/06 15:00:44 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:44 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:00:44 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:00:45 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:00:45 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:00:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:45 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['!__gen21', '!__gen22']]"][63478faa-5f34-458b-b38e-0/40644/]
12/10/06 15:00:45 INFO hadoop.FlowMapper: sinking to: GroupBy(63478faa-5f34-458b-b38e-0691f18e7d05)[by:[{1}:'!__gen21']]
12/10/06 15:00:45 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:00:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:45 INFO mapred.MapTask: Finished spill 0
12/10/06 15:00:45 INFO mapred.Task: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting
12/10/06 15:00:47 INFO mapred.LocalJobRunner: file:/tmp/hadoop-paullam/63478faa_5f34_458b_b38e_0_40644_E59B01C65A9E557906EEEAEDA9C22640/part-00000:0+84
12/10/06 15:00:47 INFO mapred.Task: Task 'attempt_local_0003_m_000000_0' done.
12/10/06 15:00:48 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:00:48 INFO mapred.LocalJobRunner:
12/10/06 15:00:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:48 INFO mapred.Merger: Merging 1 sorted segments
12/10/06 15:00:48 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 11 bytes
12/10/06 15:00:48 INFO mapred.LocalJobRunner:
12/10/06 15:00:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:48 INFO hadoop.FlowReducer: sourcing from: GroupBy(63478faa-5f34-458b-b38e-0691f18e7d05)[by:[{1}:'!__gen21']]
12/10/06 15:00:48 INFO hadoop.FlowReducer: sinking to: Hfs["SequenceFile[[UNKNOWN]->['?n-docs']]"]["/tmp/cascalog_reserved/42de5c85-198a-490e-af5e-dde2f7f92cb5/fa9e956a-cc44-406c-b98e-033a99cd91ed"]"]
12/10/06 15:00:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:48 INFO mapred.Task: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting
12/10/06 15:00:48 INFO mapred.LocalJobRunner:
12/10/06 15:00:48 INFO mapred.Task: Task attempt_local_0003_r_000000_0 is allowed to commit now
12/10/06 15:00:48 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0003_r_000000_0' to file:/tmp/cascalog_reserved/42de5c85-198a-490e-af5e-dde2f7f92cb5/fa9e956a-cc44-406c-b98e-033a99cd91ed
12/10/06 15:00:51 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:00:51 INFO mapred.Task: Task 'attempt_local_0003_r_000000_0' done.
12/10/06 15:00:51 INFO util.Hadoop18TapUtil: deleting temp path /tmp/cascalog_reserved/42de5c85-198a-490e-af5e-dde2f7f92cb5/fa9e956a-cc44-406c-b98e-033a99cd91ed/_temporary
12/10/06 15:00:51 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:00:51 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:51 INFO util.HadoopUtil: resolving application jar from found main method on: impatient.core
12/10/06 15:00:51 INFO planner.HadoopPlanner: using application jar: /Users/paullam/Dropbox/Projects/Impatient/part5/./target/impatient.jar
12/10/06 15:00:51 INFO flow.Flow: [] starting
12/10/06 15:00:51 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/10/06 15:00:51 INFO flow.Flow: [] source: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"]
12/10/06 15:00:51 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?doc-id', '?tf-idf', '?tf-word']]"]["output/tfidf"]"]
12/10/06 15:00:51 INFO flow.Flow: [] parallel execution is enabled: false
12/10/06 15:00:51 INFO flow.Flow: [] starting jobs: 5
12/10/06 15:00:51 INFO flow.Flow: [] allocating threads: 1
12/10/06 15:00:51 INFO flow.FlowStep: [] starting step: (1/5)
12/10/06 15:00:51 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:00:51 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:00:51 INFO flow.FlowStep: [] submitted hadoop job: job_local_0004
12/10/06 15:00:51 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:00:51 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:51 INFO io.MultiInputSplit: current split input path: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/en.stop
12/10/06 15:00:51 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:00:51 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:00:51 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:00:51 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:00:51 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:51 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:51 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"]
12/10/06 15:00:51 INFO hadoop.FlowMapper: sinking to: CoGroup(6bc36ec3-9880-40cf-a59d-c7049ed73325*b3c27710-7245-46f5-97ce-17f85052390c)[by:6bc36ec3-9880-40cf-a59d-c7049ed73325:[{1}:'?word']b3c27710-7245-46f5-97ce-17f85052390c:[{1}:'?word']]
12/10/06 15:00:51 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:00:51 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:51 INFO mapred.MapTask: Finished spill 0
12/10/06 15:00:51 INFO mapred.Task: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting
12/10/06 15:00:54 INFO mapred.LocalJobRunner: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/en.stop:0+544
12/10/06 15:00:54 INFO mapred.Task: Task 'attempt_local_0004_m_000000_0' done.
12/10/06 15:00:54 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:00:54 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:54 INFO io.MultiInputSplit: current split input path: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/rain.txt
12/10/06 15:00:54 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:00:54 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:00:54 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:00:54 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:00:54 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:54 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:54 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/10/06 15:00:54 INFO hadoop.FlowMapper: sinking to: CoGroup(6bc36ec3-9880-40cf-a59d-c7049ed73325*b3c27710-7245-46f5-97ce-17f85052390c)[by:6bc36ec3-9880-40cf-a59d-c7049ed73325:[{1}:'?word']b3c27710-7245-46f5-97ce-17f85052390c:[{1}:'?word']]
12/10/06 15:00:54 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:00:54 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:54 INFO mapred.MapTask: Finished spill 0
12/10/06 15:00:54 INFO mapred.Task: Task:attempt_local_0004_m_000001_0 is done. And is in the process of commiting
12/10/06 15:00:57 INFO mapred.LocalJobRunner: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/rain.txt:0+510
12/10/06 15:00:57 INFO mapred.Task: Task 'attempt_local_0004_m_000001_0' done.
12/10/06 15:00:57 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:00:57 INFO mapred.LocalJobRunner:
12/10/06 15:00:57 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:57 INFO mapred.Merger: Merging 2 sorted segments
12/10/06 15:00:57 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 4413 bytes
12/10/06 15:00:57 INFO mapred.LocalJobRunner:
12/10/06 15:00:57 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:57 INFO hadoop.FlowReducer: sourcing from: CoGroup(6bc36ec3-9880-40cf-a59d-c7049ed73325*b3c27710-7245-46f5-97ce-17f85052390c)[by:6bc36ec3-9880-40cf-a59d-c7049ed73325:[{1}:'?word']b3c27710-7245-46f5-97ce-17f85052390c:[{1}:'?word']]
12/10/06 15:00:57 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?doc-id', '?word']]"][ba43a84e-bddf-4ea7-9fff-9/53353/]
12/10/06 15:00:57 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:57 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:57 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:57 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec
12/10/06 15:00:57 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec
12/10/06 15:00:57 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:57 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:00:57 INFO mapred.Task: Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting
12/10/06 15:00:57 INFO mapred.LocalJobRunner:
12/10/06 15:00:57 INFO mapred.Task: Task attempt_local_0004_r_000000_0 is allowed to commit now
12/10/06 15:00:57 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0004_r_000000_0' to file:/tmp/hadoop-paullam/ba43a84e_bddf_4ea7_9fff_9_53353_4AAEA5BF56CAE613DD5791CACC562CC9
12/10/06 15:01:00 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:01:00 INFO mapred.Task: Task 'attempt_local_0004_r_000000_0' done.
12/10/06 15:01:00 INFO flow.FlowStep: [] starting step: (2/5)
12/10/06 15:01:00 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:01:00 INFO flow.FlowStep: [] submitted hadoop job: job_local_0005
12/10/06 15:01:00 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:00 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:00 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-paullam/ba43a84e_bddf_4ea7_9fff_9_53353_4AAEA5BF56CAE613DD5791CACC562CC9/part-00000
12/10/06 15:01:00 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:00 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:01:00 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:01:01 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:01:01 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:01:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:01 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?doc-id', '?word']]"][ba43a84e-bddf-4ea7-9fff-9/53353/]
12/10/06 15:01:01 INFO hadoop.FlowMapper: sinking to: GroupBy(301a1de4-a701-4eeb-b5db-dd40267401eb)[by:[{2}:'?tf-word', '?doc-id']]
12/10/06 15:01:01 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:01:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:01 INFO mapred.MapTask: Finished spill 0
12/10/06 15:01:01 INFO mapred.Task: Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting
12/10/06 15:01:03 INFO mapred.LocalJobRunner: file:/tmp/hadoop-paullam/ba43a84e_bddf_4ea7_9fff_9_53353_4AAEA5BF56CAE613DD5791CACC562CC9/part-00000:0+1539
12/10/06 15:01:03 INFO mapred.Task: Task 'attempt_local_0005_m_000000_0' done.
12/10/06 15:01:03 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:03 INFO mapred.LocalJobRunner:
12/10/06 15:01:03 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:03 INFO mapred.Merger: Merging 1 sorted segments
12/10/06 15:01:03 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1321 bytes
12/10/06 15:01:03 INFO mapred.LocalJobRunner:
12/10/06 15:01:03 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:03 INFO hadoop.FlowReducer: sourcing from: GroupBy(301a1de4-a701-4eeb-b5db-dd40267401eb)[by:[{2}:'?tf-word', '?doc-id']]
12/10/06 15:01:03 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?tf-word', '?doc-id', '?tf-count']]"][cfe43fd1-b45f-443b-b2ba-3/58615/]
12/10/06 15:01:03 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:03 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:03 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:03 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:03 INFO mapred.Task: Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting
12/10/06 15:01:03 INFO mapred.LocalJobRunner:
12/10/06 15:01:03 INFO mapred.Task: Task attempt_local_0005_r_000000_0 is allowed to commit now
12/10/06 15:01:03 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0005_r_000000_0' to file:/tmp/hadoop-paullam/cfe43fd1_b45f_443b_b2ba_3_58615_8D0A5EBC7F48F8F16E751590E09E66AA
12/10/06 15:01:06 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:01:06 INFO mapred.Task: Task 'attempt_local_0005_r_000000_0' done.
12/10/06 15:01:07 INFO flow.FlowStep: [] starting step: (3/5)
12/10/06 15:01:07 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:01:07 INFO flow.FlowStep: [] submitted hadoop job: job_local_0006
12/10/06 15:01:07 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:07 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-paullam/ba43a84e_bddf_4ea7_9fff_9_53353_4AAEA5BF56CAE613DD5791CACC562CC9/part-00000
12/10/06 15:01:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:07 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:01:07 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:01:07 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:01:07 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:01:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:07 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?doc-id', '?word']]"][ba43a84e-bddf-4ea7-9fff-9/53353/]
12/10/06 15:01:07 INFO hadoop.FlowMapper: sinking to: GroupBy(be787f65-2ff9-4168-b1fd-309371bed742)[by:[{2}:'?__gen32', '?__gen33']]
12/10/06 15:01:07 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:01:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:07 INFO mapred.MapTask: Finished spill 0
12/10/06 15:01:07 INFO mapred.Task: Task:attempt_local_0006_m_000000_0 is done. And is in the process of commiting
12/10/06 15:01:10 INFO mapred.LocalJobRunner: file:/tmp/hadoop-paullam/ba43a84e_bddf_4ea7_9fff_9_53353_4AAEA5BF56CAE613DD5791CACC562CC9/part-00000:0+1539
12/10/06 15:01:10 INFO mapred.Task: Task 'attempt_local_0006_m_000000_0' done.
12/10/06 15:01:10 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:10 INFO mapred.LocalJobRunner:
12/10/06 15:01:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:10 INFO mapred.Merger: Merging 1 sorted segments
12/10/06 15:01:10 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1279 bytes
12/10/06 15:01:10 INFO mapred.LocalJobRunner:
12/10/06 15:01:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:10 INFO hadoop.FlowReducer: sourcing from: GroupBy(be787f65-2ff9-4168-b1fd-309371bed742)[by:[{2}:'?__gen32', '?__gen33']]
12/10/06 15:01:10 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?df-word', '!__gen34']]"][5a06fefb-6a4a-4b3e-b29c-4/29562/]
12/10/06 15:01:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:10 INFO mapred.Task: Task:attempt_local_0006_r_000000_0 is done. And is in the process of commiting
12/10/06 15:01:10 INFO mapred.LocalJobRunner:
12/10/06 15:01:10 INFO mapred.Task: Task attempt_local_0006_r_000000_0 is allowed to commit now
12/10/06 15:01:10 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0006_r_000000_0' to file:/tmp/hadoop-paullam/5a06fefb_6a4a_4b3e_b29c_4_29562_225FC125CB2B5A52761234AB67ECDF70
12/10/06 15:01:13 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:01:13 INFO mapred.Task: Task 'attempt_local_0006_r_000000_0' done.
12/10/06 15:01:13 INFO flow.FlowStep: [] starting step: (5/5)
12/10/06 15:01:13 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:01:13 INFO flow.FlowStep: [] submitted hadoop job: job_local_0007
12/10/06 15:01:13 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:13 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-paullam/5a06fefb_6a4a_4b3e_b29c_4_29562_225FC125CB2B5A52761234AB67ECDF70/part-00000
12/10/06 15:01:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:13 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:01:13 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:01:13 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:01:13 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:01:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:13 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?df-word', '!__gen34']]"][5a06fefb-6a4a-4b3e-b29c-4/29562/]
12/10/06 15:01:13 INFO hadoop.FlowMapper: sinking to: GroupBy(5a06fefb-6a4a-4b3e-b29c-43cbb205fd7e)[by:[{1}:'?df-word']]
12/10/06 15:01:13 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:01:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:13 INFO mapred.MapTask: Finished spill 0
12/10/06 15:01:13 INFO mapred.Task: Task:attempt_local_0007_m_000000_0 is done. And is in the process of commiting
12/10/06 15:01:16 INFO mapred.LocalJobRunner: file:/tmp/hadoop-paullam/5a06fefb_6a4a_4b3e_b29c_4_29562_225FC125CB2B5A52761234AB67ECDF70/part-00000:0+784
12/10/06 15:01:16 INFO mapred.Task: Task 'attempt_local_0007_m_000000_0' done.
12/10/06 15:01:16 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:16 INFO mapred.LocalJobRunner:
12/10/06 15:01:16 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:16 INFO mapred.Merger: Merging 1 sorted segments
12/10/06 15:01:16 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 561 bytes
12/10/06 15:01:16 INFO mapred.LocalJobRunner:
12/10/06 15:01:16 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:16 INFO hadoop.FlowReducer: sourcing from: GroupBy(5a06fefb-6a4a-4b3e-b29c-43cbb205fd7e)[by:[{1}:'?df-word']]
12/10/06 15:01:16 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?df-count', '?tf-word']]"][0919cb61-0698-4413-9721-7/99813/]
12/10/06 15:01:16 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:16 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:16 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:16 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:16 INFO mapred.Task: Task:attempt_local_0007_r_000000_0 is done. And is in the process of commiting
12/10/06 15:01:16 INFO mapred.LocalJobRunner:
12/10/06 15:01:16 INFO mapred.Task: Task attempt_local_0007_r_000000_0 is allowed to commit now
12/10/06 15:01:16 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0007_r_000000_0' to file:/tmp/hadoop-paullam/0919cb61_0698_4413_9721_7_99813_EA3792B7EC6713CD163B24A8AA99F5A8
12/10/06 15:01:19 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:01:19 INFO mapred.Task: Task 'attempt_local_0007_r_000000_0' done.
12/10/06 15:01:19 INFO flow.FlowStep: [] starting step: (4/5) output/tfidf
12/10/06 15:01:19 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:01:19 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:01:19 INFO flow.FlowStep: [] submitted hadoop job: job_local_0008
12/10/06 15:01:19 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:19 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:19 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-paullam/cfe43fd1_b45f_443b_b2ba_3_58615_8D0A5EBC7F48F8F16E751590E09E66AA/part-00000
12/10/06 15:01:19 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:19 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:01:19 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:01:19 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:01:19 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:01:19 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:19 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:19 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?tf-word', '?doc-id', '?tf-count']]"][cfe43fd1-b45f-443b-b2ba-3/58615/]
12/10/06 15:01:19 INFO hadoop.FlowMapper: sinking to: CoGroup(cfe43fd1-b45f-443b-b2ba-3b4a9754fce2*0919cb61-0698-4413-9721-7297e550e045)[by:cfe43fd1-b45f-443b-b2ba-3b4a9754fce2:[{1}:'?tf-word']0919cb61-0698-4413-9721-7297e550e045:[{1}:'?tf-word']]
12/10/06 15:01:19 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:01:19 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:19 INFO mapred.MapTask: Finished spill 0
12/10/06 15:01:19 INFO mapred.Task: Task:attempt_local_0008_m_000000_0 is done. And is in the process of commiting
12/10/06 15:01:22 INFO mapred.LocalJobRunner: file:/tmp/hadoop-paullam/cfe43fd1_b45f_443b_b2ba_3_58615_8D0A5EBC7F48F8F16E751590E09E66AA/part-00000:0+1573
12/10/06 15:01:22 INFO mapred.Task: Task 'attempt_local_0008_m_000000_0' done.
12/10/06 15:01:22 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:22 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:22 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-paullam/0919cb61_0698_4413_9721_7_99813_EA3792B7EC6713CD163B24A8AA99F5A8/part-00000
12/10/06 15:01:22 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:22 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:01:22 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:01:22 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:01:22 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:01:22 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:22 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:22 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?df-count', '?tf-word']]"][0919cb61-0698-4413-9721-7/99813/]
12/10/06 15:01:22 INFO hadoop.FlowMapper: sinking to: CoGroup(cfe43fd1-b45f-443b-b2ba-3b4a9754fce2*0919cb61-0698-4413-9721-7297e550e045)[by:cfe43fd1-b45f-443b-b2ba-3b4a9754fce2:[{1}:'?tf-word']0919cb61-0698-4413-9721-7297e550e045:[{1}:'?tf-word']]
12/10/06 15:01:22 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:01:22 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:22 INFO mapred.MapTask: Finished spill 0
12/10/06 15:01:22 INFO mapred.Task: Task:attempt_local_0008_m_000001_0 is done. And is in the process of commiting
12/10/06 15:01:25 INFO mapred.LocalJobRunner: file:/tmp/hadoop-paullam/0919cb61_0698_4413_9721_7_99813_EA3792B7EC6713CD163B24A8AA99F5A8/part-00000:0+784
12/10/06 15:01:25 INFO mapred.Task: Task 'attempt_local_0008_m_000001_0' done.
12/10/06 15:01:25 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:25 INFO mapred.LocalJobRunner:
12/10/06 15:01:25 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:25 INFO mapred.Merger: Merging 2 sorted segments
12/10/06 15:01:25 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 1990 bytes
12/10/06 15:01:25 INFO mapred.LocalJobRunner:
12/10/06 15:01:25 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:25 INFO hadoop.FlowReducer: sourcing from: CoGroup(cfe43fd1-b45f-443b-b2ba-3b4a9754fce2*0919cb61-0698-4413-9721-7297e550e045)[by:cfe43fd1-b45f-443b-b2ba-3b4a9754fce2:[{1}:'?tf-word']0919cb61-0698-4413-9721-7297e550e045:[{1}:'?tf-word']]
12/10/06 15:01:25 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?doc-id', '?tf-idf', '?tf-word']]"]["output/tfidf"]"]
12/10/06 15:01:25 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:25 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:25 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec
12/10/06 15:01:25 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec
12/10/06 15:01:25 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:25 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:25 INFO mapred.Task: Task:attempt_local_0008_r_000000_0 is done. And is in the process of commiting
12/10/06 15:01:25 INFO mapred.LocalJobRunner:
12/10/06 15:01:25 INFO mapred.Task: Task attempt_local_0008_r_000000_0 is allowed to commit now
12/10/06 15:01:25 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0008_r_000000_0' to file:/Users/paullam/Dropbox/Projects/Impatient/part5/output/tfidf
12/10/06 15:01:28 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:01:28 INFO mapred.Task: Task 'attempt_local_0008_r_000000_0' done.
12/10/06 15:01:28 INFO util.Hadoop18TapUtil: deleting temp path output/tfidf/_temporary
12/10/06 15:01:28 INFO util.HadoopUtil: resolving application jar from found main method on: impatient.core
12/10/06 15:01:28 INFO planner.HadoopPlanner: using application jar: /Users/paullam/Dropbox/Projects/Impatient/part5/./target/impatient.jar
12/10/06 15:01:28 INFO flow.Flow: [] starting
12/10/06 15:01:28 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/10/06 15:01:28 INFO flow.Flow: [] source: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"]
12/10/06 15:01:28 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"]
12/10/06 15:01:28 INFO flow.Flow: [] parallel execution is enabled: false
12/10/06 15:01:28 INFO flow.Flow: [] starting jobs: 2
12/10/06 15:01:28 INFO flow.Flow: [] allocating threads: 1
12/10/06 15:01:28 INFO flow.FlowStep: [] starting step: (1/2)
12/10/06 15:01:29 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:01:29 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:01:29 INFO flow.FlowStep: [] submitted hadoop job: job_local_0009
12/10/06 15:01:29 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:29 INFO io.MultiInputSplit: current split input path: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/en.stop
12/10/06 15:01:29 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:01:29 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:01:29 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:01:29 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:01:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:29 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"]
12/10/06 15:01:29 INFO hadoop.FlowMapper: sinking to: CoGroup(6bc36ec3-9880-40cf-a59d-c7049ed73325*b3c27710-7245-46f5-97ce-17f85052390c)[by:6bc36ec3-9880-40cf-a59d-c7049ed73325:[{1}:'?word']b3c27710-7245-46f5-97ce-17f85052390c:[{1}:'?word']]
12/10/06 15:01:29 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:01:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:29 INFO mapred.MapTask: Finished spill 0
12/10/06 15:01:29 INFO mapred.Task: Task:attempt_local_0009_m_000000_0 is done. And is in the process of commiting
12/10/06 15:01:32 INFO mapred.LocalJobRunner: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/en.stop:0+544
12/10/06 15:01:32 INFO mapred.Task: Task 'attempt_local_0009_m_000000_0' done.
12/10/06 15:01:32 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:32 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:32 INFO io.MultiInputSplit: current split input path: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/rain.txt
12/10/06 15:01:32 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:01:32 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:01:32 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:01:32 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:01:32 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:32 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:32 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/10/06 15:01:32 INFO hadoop.FlowMapper: sinking to: CoGroup(6bc36ec3-9880-40cf-a59d-c7049ed73325*b3c27710-7245-46f5-97ce-17f85052390c)[by:6bc36ec3-9880-40cf-a59d-c7049ed73325:[{1}:'?word']b3c27710-7245-46f5-97ce-17f85052390c:[{1}:'?word']]
12/10/06 15:01:32 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:01:32 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:32 INFO mapred.MapTask: Finished spill 0
12/10/06 15:01:32 INFO mapred.Task: Task:attempt_local_0009_m_000001_0 is done. And is in the process of commiting
12/10/06 15:01:35 INFO mapred.LocalJobRunner: file:/Users/paullam/Dropbox/Projects/Impatient/part5/data/rain.txt:0+510
12/10/06 15:01:35 INFO mapred.Task: Task 'attempt_local_0009_m_000001_0' done.
12/10/06 15:01:35 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:35 INFO mapred.LocalJobRunner:
12/10/06 15:01:35 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:35 INFO mapred.Merger: Merging 2 sorted segments
12/10/06 15:01:35 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 4413 bytes
12/10/06 15:01:35 INFO mapred.LocalJobRunner:
12/10/06 15:01:35 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:35 INFO hadoop.FlowReducer: sourcing from: CoGroup(6bc36ec3-9880-40cf-a59d-c7049ed73325*b3c27710-7245-46f5-97ce-17f85052390c)[by:6bc36ec3-9880-40cf-a59d-c7049ed73325:[{1}:'?word']b3c27710-7245-46f5-97ce-17f85052390c:[{1}:'?word']]
12/10/06 15:01:35 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?word', '!__gen39']]"][dcd5511e-f52f-4a10-aaa9-a/45172/]
12/10/06 15:01:35 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:35 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:35 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:35 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec
12/10/06 15:01:35 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec
12/10/06 15:01:35 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:35 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:35 INFO mapred.Task: Task:attempt_local_0009_r_000000_0 is done. And is in the process of commiting
12/10/06 15:01:35 INFO mapred.LocalJobRunner:
12/10/06 15:01:35 INFO mapred.Task: Task attempt_local_0009_r_000000_0 is allowed to commit now
12/10/06 15:01:35 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0009_r_000000_0' to file:/tmp/hadoop-paullam/dcd5511e_f52f_4a10_aaa9_a_45172_EC9FFCBF11F42C17C9A6A051896109EB
12/10/06 15:01:38 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:01:38 INFO mapred.Task: Task 'attempt_local_0009_r_000000_0' done.
12/10/06 15:01:38 INFO flow.FlowStep: [] starting step: (2/2) output/wc
12/10/06 15:01:38 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:01:38 INFO flow.FlowStep: [] submitted hadoop job: job_local_0010
12/10/06 15:01:38 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:38 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:38 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-paullam/dcd5511e_f52f_4a10_aaa9_a_45172_EC9FFCBF11F42C17C9A6A051896109EB/part-00000
12/10/06 15:01:38 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:38 INFO mapred.MapTask: numReduceTasks: 1
12/10/06 15:01:38 INFO mapred.MapTask: io.sort.mb = 100
12/10/06 15:01:38 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/06 15:01:38 INFO mapred.MapTask: record buffer = 262144/327680
12/10/06 15:01:38 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:38 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:38 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?word', '!__gen39']]"][dcd5511e-f52f-4a10-aaa9-a/45172/]
12/10/06 15:01:38 INFO hadoop.FlowMapper: sinking to: GroupBy(dcd5511e-f52f-4a10-aaa9-a8be66885194)[by:[{1}:'?word']]
12/10/06 15:01:38 INFO mapred.MapTask: Starting flush of map output
12/10/06 15:01:38 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:38 INFO mapred.MapTask: Finished spill 0
12/10/06 15:01:38 INFO mapred.Task: Task:attempt_local_0010_m_000000_0 is done. And is in the process of commiting
12/10/06 15:01:41 INFO mapred.LocalJobRunner: file:/tmp/hadoop-paullam/dcd5511e_f52f_4a10_aaa9_a_45172_EC9FFCBF11F42C17C9A6A051896109EB/part-00000:0+784
12/10/06 15:01:41 INFO mapred.Task: Task 'attempt_local_0010_m_000000_0' done.
12/10/06 15:01:41 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/06 15:01:41 INFO mapred.LocalJobRunner:
12/10/06 15:01:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:41 INFO mapred.Merger: Merging 1 sorted segments
12/10/06 15:01:41 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 561 bytes
12/10/06 15:01:41 INFO mapred.LocalJobRunner:
12/10/06 15:01:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:41 INFO hadoop.FlowReducer: sourcing from: GroupBy(dcd5511e-f52f-4a10-aaa9-a8be66885194)[by:[{1}:'?word']]
12/10/06 15:01:41 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"]
12/10/06 15:01:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:41 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/10/06 15:01:41 INFO mapred.Task: Task:attempt_local_0010_r_000000_0 is done. And is in the process of commiting
12/10/06 15:01:41 INFO mapred.LocalJobRunner:
12/10/06 15:01:41 INFO mapred.Task: Task attempt_local_0010_r_000000_0 is allowed to commit now
12/10/06 15:01:41 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0010_r_000000_0' to file:/Users/paullam/Dropbox/Projects/Impatient/part5/output/wc
12/10/06 15:01:44 INFO mapred.LocalJobRunner: reduce > reduce
12/10/06 15:01:44 INFO mapred.Task: Task 'attempt_local_0010_r_000000_0' done.
12/10/06 15:01:44 INFO util.Hadoop18TapUtil: deleting temp path output/wc/_temporary
$ cat output/tfidf/part-00000
doc02 0.9162907318741551 air
doc01 0.44628710262841953 area
doc02 0.22314355131420976 area
doc03 0.22314355131420976 area
doc05 0.9162907318741551 australia
doc05 0.9162907318741551 broken
doc04 0.9162907318741551 california's
doc04 0.9162907318741551 cause
doc02 0.9162907318741551 cloudcover
doc04 0.9162907318741551 death
doc04 0.9162907318741551 deserts
doc03 0.9162907318741551 downwind
doc01 0.22314355131420976 dry
doc02 0.22314355131420976 dry
doc03 0.22314355131420976 dry
doc05 0.9162907318741551 dvd
doc04 0.9162907318741551 effect
doc04 0.9162907318741551 known
doc03 0.5108256237659907 land
doc05 0.5108256237659907 land
doc01 0.5108256237659907 lee
doc02 0.5108256237659907 lee
doc04 0.5108256237659907 leeward
doc03 0.5108256237659907 leeward
doc02 0.9162907318741551 less
doc03 0.9162907318741551 lies
doc02 0.22314355131420976 mountain
doc03 0.22314355131420976 mountain
doc04 0.22314355131420976 mountain
doc01 0.9162907318741551 mountainous
doc04 0.9162907318741551 primary
doc02 0.9162907318741551 produces
doc04 0.0 rain
doc01 0.0 rain
doc02 0.0 rain
doc03 0.0 rain
doc04 0.9162907318741551 ranges
doc05 0.9162907318741551 secrets
doc01 0.0 shadow
doc02 0.0 shadow
doc03 0.0 shadow
doc04 0.0 shadow
doc02 0.9162907318741551 sinking
doc04 0.9162907318741551 such
doc04 0.9162907318741551 valley
doc05 0.9162907318741551 women
Paul-Lams-computer:part5 paullam$
$ cat output/wc/part-00000
air 1
area 4
australia 1
broken 1
california's 1
cause 1
cloudcover 1
death 1
deserts 1
downwind 1
dry 3
dvd 1
effect 1
known 1
land 2
lee 2
leeward 2
less 1
lies 1
mountain 3
mountainous 1
primary 1
produces 1
rain 5
ranges 1
secrets 1
shadow 4
sinking 1
such 1
valley 1
women 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment