Created
April 28, 2014 23:21
-
-
Save Chandler/11386831 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
update, I have a test case here: https://gist.github.com/Chandler/7766963 | |
I'm running it with multiple files as the input and cascading.hadoop.hfs.combine.files=true | |
When I run it against stock elephant bird it fails with the original error | |
"com.twitter.elephantbird.mapred.input.DeprecatedLzoTextInputFormat cannot be cast to org.apache.hadoop.mapred.FileInputFormat" | |
https://github.com/kevinweil/elephant-bird/pull/359 | |
When I run it with Dmitriy's elephant bird patch it fails with "DeprecatedInputFormatWrapper can not support RecordReaders that don't return same key & value objects. current reader class : class com.twitter.elephantbird.mapreduce.input.LzoLineRecordReader" | |
I addressed the RecordReaders issue here: https://github.com/kevinweil/elephant-bird/pull/360 | |
When I run the test against both mine and Dmitriy's patches the job succeeds! | |
But something is up with the combine files setting. | |
input is two identical lzo text files that contain 3 newline separated words | |
hello | |
hello | |
tiger | |
output with cascading.hadoop.hfs.combine.files=false: | |
tiger 2 | |
hello 4 | |
output with cascading.hadoop.hfs.combine.files=true | |
1 hello 3 | |
tiger 2 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment