Skip to content

Instantly share code, notes, and snippets.

@iconara
Last active October 19, 2015 08:08
Show Gist options
  • Save iconara/4ac46d0a9bf65d17249a to your computer and use it in GitHub Desktop.
Save iconara/4ac46d0a9bf65d17249a to your computer and use it in GitHub Desktop.
Read a sequence file
def read_sequence_file(path, key_type, value_type)
uri = "file://#{File.expand_path(path)}"
conf = Hadoop::Conf::Configuration.new
fs = Hadoop::Fs::FileSystem.get(java.net.URI.create(uri), conf)
path = Hadoop::Fs::Path.new(uri)
reader = Hadoop::Io::SequenceFile::Reader.new(fs, path, conf)
key = Hadoop::Util::ReflectionUtils.new_instance(reader.key_class, conf)
value = Hadoop::Util::ReflectionUtils.new_instance(reader.value_class, conf)
while reader.next(key, value)
yield key, value
end
end
require 'humboldt'
def read_sequence_file(path)
uri = "file://#{File.expand_path(path)}"
conf = Hadoop::Conf::Configuration.new
fs = Hadoop::Fs::FileSystem.get(java.net.URI.create(uri), conf)
path = Hadoop::Fs::Path.new(uri)
reader = Hadoop::Io::SequenceFile::Reader.new(fs, path, conf)
key = Humboldt::TypeConverter.from_hadoop(reader.key_class.ruby_class).new
value = Humboldt::TypeConverter.from_hadoop(reader.value_class.ruby_class).new
while reader.next(key.hadoop, value.hadoop)
yield key.ruby, value.ruby
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment