Skip to content

Instantly share code, notes, and snippets.

@bcg
Created April 21, 2011 16:22
Show Gist options
  • Save bcg/934905 to your computer and use it in GitHub Desktop.
Save bcg/934905 to your computer and use it in GitHub Desktop.
Map reduce unit tests for hadoop
#!/usr/bin/env ruby
require './rubydoop'
HADOOP_HOME = '/usr/local/Cellar/hadoop/0.21.0/libexec/'
map do |location, line|
line.split(/\s+/).each do |word|
next unless word.strip.length > 0
emit word.strip.downcase.gsub(/^\(|[^a-zA-Z]$/, ''), location
end
end
reduce do |key, values|
emit key, values.join(",")
end
require 'minitest/unit'
require 'open3'
MiniTest::Unit.autorun
def map_reduce_results(script, input)
Open3.pipeline_rw("./#{script} map", "sort", "./#{script} reduce") do |in_io, out_io, wt|
in_io.print input
in_io.close
out_io.readlines
end
end
class ExampleTest < MiniTest::Unit::TestCase
def test_it
input = ["file@1\t t this is a sentence", "file@2\t t this is another sentence\n"].join("\n")
result_list = map_reduce_results('./example.rb', input)
puts result_list.inspect
assert(result_list.include?("sentence\tfile@1,file@2\n"))
assert(result_list.include?("this\tfile@1,file@2\n"))
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment