Skip to content

Instantly share code, notes, and snippets.

@jaydonnell
Created November 19, 2009 22:41
Show Gist options
  • Save jaydonnell/239107 to your computer and use it in GitHub Desktop.
Save jaydonnell/239107 to your computer and use it in GitHub Desktop.
# Use Jruby to read hadoop sequence files
def load_libs(libs)
Dir.glob(File.join(libs,"*.jar")).each { |f|
require f
}
end
load_libs ENV["HADOOP_HOME"]
load_libs File.join(ENV["HADOOP_HOME"], 'lib')
module H; end
module J; end
H::Configuration = Java::OrgApacheHadoopConf::Configuration
H::FileSystem = Java::OrgApacheHadoopFs::FileSystem
H::Path = Java::OrgApacheHadoopFs::Path
H::SequenceFile = Java::OrgApacheHadoopIo::SequenceFile
H::Writable = Java::OrgApacheHadoopIo::Writable
H::ReflectionUtils = Java::OrgApacheHadoopUtil::ReflectionUtils
J::URI = Java::JavaNet::URI
uri = '/mnt/hadoop/workspace/ddonnell/web_raw_requests.seq'
conf = H::Configuration.new
fs = H::FileSystem.get(J::URI.create(uri), conf)
path = H::Path.new(uri)
reader = H::SequenceFile::Reader.new(fs, path, conf)
key = H::ReflectionUtils.newInstance(reader.getKeyClass(), conf)
value = H::ReflectionUtils.newInstance(reader.getValueClass(), conf)
reader.next(key, value)
puts key.to_s
puts value.to_s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment