Skip to content

Instantly share code, notes, and snippets.

@rampion
Created May 17, 2009 00:22
Show Gist options
  • Save rampion/112858 to your computer and use it in GitHub Desktop.
Save rampion/112858 to your computer and use it in GitHub Desktop.
An implementation of an O(n + p) work algorithm for selecting p items with replacment from a weighted set of n items
class WeightedSelection
# input is an Enumerable of (item,frequency) pairs
#
# (takes O(n) time)
def initialize(input)
# set up the alias method for selection
# http://prxq.wordpress.com/2006/04/17/the-alias-method/
@n = input.size
@m = input.inject(0) { |sum,(item,freq)| sum + freq }
# find the items with frequency lesser and greater than average
lessers, greaters = input.map do |item,freq|
# pad the frequency so we can keep it integral
# when subdivided
[ item, freq*@n ]
end.partition do |item,adj_freq|
adj_freq <= @m
end
@partitions = Array.new(@n) do
item, adj_freq = lessers.shift
other_item = if adj_freq < @m
# use part of another item's frequency to pad
# out the partition
other_item, other_adj_freq = greaters.shift
other_adj_freq -= (@m - adj_freq)
(other_adj_freq <= @m ? lessers : greaters) << [ other_item, other_adj_freq ]
other_item
end
[ item, other_item , adj_freq ]
end
end
# pick an item from the original input with probability
# proportional to its relative frequency among all items
# P(word) = frequency / sum_of_frequencies
#
# (takes O(1) time)
def select
# pick a partition at random
item, other_item, adj_freq = @partitions[ rand(@n) ]
# select the first item in the partition with appropriate
# probability
if rand(@m) < adj_freq
item
else
other_item
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment