Skip to content

Instantly share code, notes, and snippets.

@Chandler
Created April 28, 2014 23:21
Show Gist options
  • Save Chandler/11386831 to your computer and use it in GitHub Desktop.
Save Chandler/11386831 to your computer and use it in GitHub Desktop.
update, I have a test case here: https://gist.github.com/Chandler/7766963
I'm running it with multiple files as the input and cascading.hadoop.hfs.combine.files=true
When I run it against stock elephant bird it fails with the original error
"com.twitter.elephantbird.mapred.input.DeprecatedLzoTextInputFormat cannot be cast to org.apache.hadoop.mapred.FileInputFormat"
https://github.com/kevinweil/elephant-bird/pull/359
When I run it with Dmitriy's elephant bird patch it fails with "DeprecatedInputFormatWrapper can not support RecordReaders that don't return same key & value objects. current reader class : class com.twitter.elephantbird.mapreduce.input.LzoLineRecordReader"
I addressed the RecordReaders issue here: https://github.com/kevinweil/elephant-bird/pull/360
When I run the test against both mine and Dmitriy's patches the job succeeds!
But something is up with the combine files setting.
input is two identical lzo text files that contain 3 newline separated words
hello
hello
tiger
output with cascading.hadoop.hfs.combine.files=false:
tiger 2
hello 4
output with cascading.hadoop.hfs.combine.files=true
1 hello 3
tiger 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment