Skip to content

Instantly share code, notes, and snippets.

@mindscratch
Created October 31, 2011 18:41
Show Gist options
  • Select an option

  • Save mindscratch/1328410 to your computer and use it in GitHub Desktop.

Select an option

Save mindscratch/1328410 to your computer and use it in GitHub Desktop.
Hadoop Streaming Map/Reduce with Ruby
Now is definitely the time
#!/usr/bin/ruby
while line = gets
words = line.split /\s/
words.each do |word|
puts "#{word[0,1]}\t#{word.size}"
end
end
#!/usr/bin/ruby
curr_key = nil
curr_total = 0
curr_key_count = 0
while line = gets
character, length = line.split /\t/
length = length.to_f
if curr_key == character
curr_total += length
curr_key_count += 1
else
unless curr_key.nil?
avg = curr_total / curr_key_count
puts "#{curr_key}\t#{avg}"
end
curr_key = character
curr_total = length
curr_key_count = 1
end
end
unless curr_key.nil?
avg = curr_total / curr_key_count
puts "#{curr_key}\t#{avg}"
end
$ cat data.txt | ./mapper.rb | ./reducer.rb
N 3.0
i 2.0
d 10.0
t 3.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment