Skip to content

Instantly share code, notes, and snippets.

@fredrik
Created January 15, 2010 13:08
Show Gist options
  • Save fredrik/278043 to your computer and use it in GitHub Desktop.
Save fredrik/278043 to your computer and use it in GitHub Desktop.
# run like so:
# $> ruby normalize.rb --run=local --input data/sizes --output data/normalized_sizes
module Normalize
class Mapper < Wukong::Streamer::LineStreamer
def process(line)
fields = line.strip.split("\t")
country = fields.reverse!.pop
data = fields.map(&:to_i)
sum = data.sum.to_f
normalized = data.map {|x| 100 * x/sum }
s = normalized.join(",")
yield [country, s]
end
end
end
Wukong::Script.new(Normalize::Mapper, nil).run
#
# run like so:
# $> ruby sizes.rb --run=local --input data/orders.tsv --output data/sizes
module JeanSizes
class Mapper < Wukong::Streamer::LineStreamer
def process(line)
fields = line.strip.split("\t")
country = fields[3]
sizes = fields[11..23]
yield [country, sizes] if sizes.length == 13
end
end
class Reducer < Wukong::Streamer::ListReducer
def finalize
sums = values.pop[1..-1].map(&:to_i)
for v in values
sizes = v[1..-1].map(&:to_i)
sums = sums.zip(sizes).map {|x| x.sum}
end
yield [key, sums]
end
end
end
Wukong::Script.new(JeanSizes::Mapper, JeanSizes::Reducer).run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment