Starting with this email on @lrug mailing list:
http://lists.lrug.org/pipermail/chat-lrug.org/2013-October/009583.html
Eventually, 11 different implementations where proposed (code below). I adapted them to a Rspec spec and ran correctness and (naive) performance tests on them.
➜ spec git:(master) ✗ ls -lrt *.rb
-rw-r--r-- 1 peter_v staff 825 Oct 25 22:31 roland_spec.rb
# originally michael_spec was here, but updated to fix my mistake
-rw-r--r-- 1 peter_v staff 1515 Oct 25 22:33 peter_spec.rb
-rw-r--r-- 1 peter_v staff 825 Oct 25 22:34 tom_spec.rb
-rw-r--r-- 1 peter_v staff 1036 Oct 25 22:43 dominic_spec.rb
-rw-r--r-- 1 peter_v staff 913 Oct 26 08:20 michael_spec.rb
-rw-r--r-- 1 peter_v staff 909 Oct 26 08:41 jason_spec.rb
-rw-r--r-- 1 peter_v staff 2237 Oct 26 08:42 peter2_spec.rb
-rw-r--r-- 1 peter_v staff 1268 Oct 26 09:56 dave_spec.rb
-rw-r--r-- 1 peter_v staff 2166 Oct 26 12:50 peter3_spec.rb
-rw-r--r-- 1 peter_v staff 1010 Oct 28 09:19 dominic_recursion.rb
-rw-r--r-- 1 peter_v staff 966 Oct 28 09:19 dominic_lists.rb
All 11 now produce the expected results (variation allowed in example 0 and example 2).
➜ spec git:(master) ✗ rspec roland_spec.rb
.....
Finished in 0.00101 seconds
5 examples, 0 failures
➜ spec git:(master) ✗ rspec michael_spec.rb
.....
Finished in 0.00111 seconds
5 examples, 0 failures
➜ spec git:(master) ✗ rspec peter_spec.rb
.....
Finished in 0.00101 seconds
5 examples, 0 failures
➜ spec git:(master) ✗ rspec tom_spec.rb
.....
Finished in 0.00118 seconds
5 examples, 0 failures
➜ spec git:(master) ✗ rspec dominic_spec.rb
.....
Finished in 0.00104 seconds
5 examples, 0 failures
➜ spec git:(master) ✗ rspec jason_spec.rb
.....
Finished in 0.00086 seconds
5 examples, 0 failures
➜ spec git:(master) ✗ rspec peter2_spec.rb
...........
Finished in 0.00691 seconds
11 examples, 0 failures
➜ spec git:(master) ✗ rspec dave_spec.rb
.....
Finished in 0.00105 seconds
5 examples, 0 failures
➜ spec git:(master) ✗ rspec peter3_spec.rb
..........
Finished in 0.00185 seconds
10 examples, 0 failures
➜ spec git:(master) ✗ rspec dominic_recursion.rb
......
Finished in 0.00112 seconds
6 examples, 0 failures
➜ spec git:(master) ✗ rspec dominic_lists.rb
.....
Finished in 0.00097 seconds
5 examples, 0 failures
UPDATE: a proper benchmark is published here: https://gist.github.com/textgoeshere/02fd396e56ca8a8d2efe
UPDATE: since I cannot post comments on that gist, an update of fruity test for the 4 fastest contenders is shown here (I print the code below for reference):
➜ fruity git:(master) ✗ # below: multiplier 50, pattern 'lod'
➜ fruity git:(master) ✗ ruby letters-benchmarking.rb
Running each test once. Test will take about 17 seconds.
Peter3 is faster than DominicLists by 30.000000000000004% ± 10.0%
DominicLists is faster than Dave by 30.000000000000004% ± 10.0%
Dave is faster than Jason by 10x ± 1.0
➜ fruity git:(master) ✗ # below : multiplier 25, pattern 'lold'
➜ fruity git:(master) ✗ ruby letters-benchmarking.rb
Running each test once. Test will take about 1 minute.
DominicLists is faster than Peter3 by 19.999999999999996% ± 1.0%
Peter3 is faster than Dave by 10.000000000000009% ± 10.0%
Dave is faster than Jason by 4x ± 0.1
As an extremely naive speed test, changing the
test string to ("hello world" * 100) and running
rspec again on all 10 yields (seconds, lines of code,
and max RSIZE as seen in top -o cpu
on MacBook Pro).
roland_spec.rb => 9.14 s 11 loc (1035 MB max RSIZE)
michael_spec.rb => 8.82 s 13 loc (1044 MB max RSIZE)
peter_spec.rb => 5.91 s 30 loc ( 244 MB max RSIZE)
tom_spec.rb => 20.05 s 9 loc ( 326 MB max RSIZE)
dominic_spec.rb => 395. s 14 loc ( 434 MB max RSIZE)
jason_spec.rb => 16.21 s 19 loc ( 535 MB max RSIZE)
peter2_spec.rb => 5.24 s 42 loc ( 222 MB max RSIZE)
dave_spec.rb => 4.82 s 19 loc ( 323 MB max RSIZE)
peter3_spec.rb => 4.42 s 39 loc ( 239 MB max RSIZE)
dominic_recursion.rb => 64.47 s 10 loc ( 482 MB max RSIZE)
dominic_lists.rb => 4.21 s 10 loc ( 195 MB max RSIZE)
Dominic fixed the tail handling of dominic_spec.rb
into dominic_recursion.rb
.
Of course, the Rspec fails on the ("hello world" * 100)
test case, but the reported results make sense: the last 2
found results are always: [1092, 1093, 1099], [1092, 1096, 1099]
which are indeed the last 2 'lod' combinations in
"hello world" * 100.
➜ spec git:(master) ✗ cat roland_spec.rb
def letters(a, b)
indexes = b.each_char.map do |letter|
a.each_char.each_with_index.select {|l, _| l == letter }.map {|_, i| i }
end
start = indexes.shift
if indexes.empty?
[start]
else
start.product(*indexes).select {|x| x == x.sort }
end
end
# NOTE: these are the tests run against all implementations
# replaced by '...' in prints below.
describe "letters" do
let(:a) { "hello world" * 100 }
specify "example 1" do
letters(a, "e").should == [[1]]
end
specify "example 2" do
letters(a, "l").should == [[2, 3, 9]]
end
specify "example 3" do
letters(a, "el").should == [[1,2], [1,3], [1,9]]
end
specify "example 4" do
letters(a, "lo").should == [[2,4], [2,7], [3,4], [3,7]]
end
specify "example 5" do
letters(a, "lod").should == [ [2,4,10], [2,7,10], [3,4,10], [3,7,10] ]
end
end
➜ spec git:(master) ✗ cat michael_spec.rb
def get_indexes(needle, haystack)
(0...haystack.length).find_all { |i| haystack[i,1] == needle }
end
def map_indexes(needle, haystack)
needle.split('').map do |nee|
get_indexes nee, haystack
end
end
def letters(haystack, needle)
indexes = map_indexes(needle, haystack)
first_index = indexes.shift
first_index.product(*indexes).select{|match| match == match.sort}
end
...
➜ spec git:(master) ✗ cat peter_spec.rb
def letters(text, pattern)
start = 0
result = find_all_from_pattern(text, start, pattern)
end
def find_all_from_pattern(text, from, pattern)
letter = pattern[0]
remainder = pattern[1..-1]
r = find_all_from_letter(text, from, letter)
if remainder.size > 0
r.each_with_object([]) do |previous, results|
one_deeper = find_all_from_pattern(text, (previous+1), remainder)
one_deeper.each do |deeper|
results << ([previous] + deeper)
end
end
else # leaf node, end of iteration
r
end.map{ |e| Array(e) }
end
def find_all_from_letter(text, from, letter)
[].tap do |result|
while(match = find_one_from_letter(text, from, letter))
result << match
from = match + 1
end
end
end
def find_one_from_letter(text, from, letter)
return nil unless letter
index = text[from..-1].index(letter)
index && (index + from)
end
...
➜ spec git:(master) ✗ cat tom_spec.rb
def letters(a, b, starting_at = 0)
if b.empty?
[[]]
else
a.each_char.with_index.drop(starting_at).
select { |c, _| c == b[0] }.map(&:last).
flat_map { |i| letters(a, b[1..-1], i.succ).map(&[i].method(:+)) }
end
end
...
➜ spec git:(master) ✗ cat dominic_spec.rb
def letters(haystack, needles)
# base case: if the letters to match are empty, we've succeeded
return [[]] if needles.empty?
# base case: if the word to match in is empty, we've failed
return false if haystack.empty?
result = []
if needles[0] == haystack[0]
tail = letters(haystack[1..-1], needles[1..-1])
result += tail.map { |t| [0] + t.map { |i| i + 1 } } if tail
end
tail = letters(haystack[1..-1], needles)
result += tail.map { |t| t.map { |i| i + 1 } } if tail
result
end
...
➜ spec git:(master) ✗ cat jason_spec.rb
def get_results(string, s_i, chars, c_i )
finds = []
while i = string.index(chars[c_i], s_i)
if chars[c_i + 1]
if more_finds = get_results(string, i + 1, chars , c_i + 1)
finds += more_finds.map{|a| a.unshift(i)}
else
return nil
end
else
finds << [i]
end
s_i = i + 1
end
return finds
end
def letters(a, b)
return get_results(a, 0, b.split(//), 0 )
end
...
➜ spec git:(master) ✗ cat peter2_spec.rb
def find_all_from_pattern(text_or_rainbow, pattern, from = 0)
rainbow = build_rainbow(text_or_rainbow)
letter = pattern[0]
remainder = pattern[1..-1]
matches = find_all_from_letter(rainbow, from, letter)
if remainder.empty?
matches # leaf node
else
find_remainder(rainbow, matches, remainder)
end.map{ |e| Array(e) }
end
alias :letters :find_all_from_pattern
def find_remainder(rainbow, previous_matches, remainder)
previous_matches.each_with_object([]) do |previous, results|
find_all_from_pattern(rainbow, remainder, (previous+1)).each do |deeper|
results << ([previous] + deeper)
end
end
end
def find_all_from_letter(rainbow, from, letter)
[].tap do |result|
while(match = find_one_from_letter(rainbow, from, letter))
result << match
from = match + 1
end
end
end
def find_one_from_letter(rainbow, from, letter)
index = rainbow[from].index(letter)
index && (index + from)
end
def build_rainbow(shortened)
return shortened if shortened.is_a?(Array)
[].tap do |rainbow|
until (shortened.empty?)
rainbow.push(shortened)
shortened = shortened[1..-1]
end
rainbow.push('')
end
end
...
➜ spec git:(master) ✗ cat dave_spec.rb
def letters(haystack_str, needle_str)
haystack = haystack_str.chars
needle = needle_str.chars
@pos_cache = haystack.each.with_index.with_object(Hash.new { |h, k| h[k] = [] }) { |(c, i), h| h[c] << i if needle.include?(c) }
@match_cache = Hash.new { |h, k| h[k] = {} }
matches(needle)
end
def matches(chars, current_pos = -1)
return [[]] unless chars.any?
char, *rest = *chars
@match_cache[char][current_pos] ||= begin
this_matches = @pos_cache[char].select { |candidate_pos| candidate_pos > current_pos }
this_matches.each_with_object([]) do |this_pos, memo|
matches(rest, this_pos).each do |rest_matches|
memo << [this_pos, *rest_matches]
end
end
end
end
...
➜ spec git:(master) ✗ cat peter3_spec.rb
def find_all_from_pattern(text, pattern, from = 0)
@reverse_index ||= reverse_index(text)
@cache ||= {}
letter = pattern[0]
remainder = pattern[1..-1]
matches = @reverse_index[letter].select{ |i| i >= from }
if remainder.empty?
matches # leaf node
else
find_remainder(matches, remainder)
end.map{ |e| Array(e) }
end
alias :letters :find_all_from_pattern
def find_remainder(previous_matches, remainder)
previous_matches.each_with_object([]) do |previous, results|
find_all_remaining_after_previous(remainder, previous, results)
end
end
def find_all_remaining_after_previous(remainder, previous, results)
with_cache(remainder, previous) do |_remainder, _previous|
find_all_from_pattern(nil, _remainder, (_previous+1))
end.each do |deeper|
results << ([previous] + deeper)
end
end
def reverse_index(haystack)
Hash.new{ |h,k| h[k] = [] }.tap do |ri|
haystack.each_char.with_index do |c, i|
ri[c] << i
end
end
end
MAX_REMAINDER_FOR_CACHE = 1
def with_cache(remainder, previous)
if remainder.size <= MAX_REMAINDER_FOR_CACHE
@cache[remainder.hash ^ previous.hash] ||= yield(remainder, previous)
else
yield(remainder, previous)
end
end
...
➜ spec git:(master) ✗ cat dominic_recursion.rb
def letters(haystack, needles, index = 0)
# base case: if the letters to match are empty, we've succeeded
return [[]] if needles.empty?
# base case: if the word to match in is empty, we've failed
return [] if haystack.empty?
result = []
if needles[0] == haystack[0]
tail = letters(haystack[1..-1], needles[1..-1], index + 1)
result += tail.map { |t| [index] + t }
end
result + letters(haystack[1..-1], needles, index + 1)
end
...
➜ spec git:(master) ✗ cat dominic_lists.rb
def letters(haystack, needles)
indices = Hash[haystack.each_char.with_index.group_by { |(char, _)| char }.map { |(char, values)| [char, values.map { |(_, index)| index }] }]
results = indices[needles[0]].map { |i| [i] }
needles[1..-1].each_char do |char|
results = results.flat_map do |r|
indices[char].drop_while { |i| i < r.last }.map { |i| r + [i] }
end
end
results
end
...
For reference, my version of the code for fruity. Adapted from the original by Dave on https://gist.github.com/textgoeshere/02fd396e56ca8a8d2efe
➜ fruity git:(master) ✗ cat letters-benchmarking.rb
require 'fruity'
module Roland
def self.letters(a, b)
indexes = b.each_char.map do |letter|
a.each_char.each_with_index.select {|l, _| l == letter }.map {|_, i| i }
end
start = indexes.shift
if indexes.empty?
[start]
else
start.product(*indexes).select {|x| x == x.sort }
end
end
end
module Tom
def self.letters(a, b, starting_at = 0)
if b.empty?
[[]]
else
a.each_char.with_index.drop(starting_at).
select { |c, _| c == b[0] }.map(&:last).
flat_map { |i| letters(a, b[1..-1], i.succ).map(&[i].method(:+)) }
end
end
end
module DominicLists
def self.letters(haystack, needles)
indices = Hash[haystack.each_char.with_index.group_by { |(char, _)| char }.map { |(char, values)| [char, values.map { |(_, index)| index }] }]
results = indices[needles[0]].map { |i| [i] }
needles[1..-1].each_char do |char|
results = results.flat_map do |r|
indices[char].drop_while { |i| i < r.last }.map { |i| r + [i] }
end
end
results
end
end
module Dave
extend self
def letters(haystack_str, needle_str)
haystack = haystack_str.chars
needle = needle_str.chars
@pos_cache = haystack.each.with_index.with_object(Hash.new { |h, k| h[k] = [] }) { |(c, i), h| h[c] << i if needle.include?(c) }
@match_cache = Hash.new { |h, k| h[k] = {} }
matches(needle)
end
def matches(chars, current_pos = -1)
return [[]] unless chars.any?
char, *rest = *chars
@match_cache[char][current_pos] ||= begin
this_matches = @pos_cache[char].select { |candidate_pos| candidate_pos > current_pos }
this_matches.each_with_object([]) do |this_pos, memo|
matches(rest, this_pos).each do |rest_matches|
memo << [this_pos, *rest_matches]
end
end
end
end
end
module Peter
extend self
def letters(text, pattern)
start = 0
result = find_all_from_pattern(text, start, pattern)
end
def find_all_from_pattern(text, from, pattern)
letter = pattern[0]
remainder = pattern[1..-1]
r = find_all_from_letter(text, from, letter)
if remainder.size > 0
r.each_with_object([]) do |previous, results|
one_deeper = find_all_from_pattern(text, (previous+1), remainder)
one_deeper.each do |deeper|
results << ([previous] + deeper)
end
end
else # leaf node, end of iteration
r
end.map{ |e| Array(e) }
end
def find_all_from_letter(text, from, letter)
[].tap do |result|
while(match = find_one_from_letter(text, from, letter))
result << match
from = match + 1
end
end
end
def find_one_from_letter(text, from, letter)
return nil unless letter
index = text[from..-1].index(letter)
index && (index + from)
end
end
module Peter3
extend self
def find_all_from_pattern(text, pattern, from = 0)
@reverse_index ||= reverse_index(text)
@cache ||= {}
letter = pattern[0]
remainder = pattern[1..-1]
matches = @reverse_index[letter].select{ |i| i >= from }
if remainder.empty?
matches # leaf node
else
find_remainder(matches, remainder)
end.map{ |e| Array(e) }
end
alias :letters :find_all_from_pattern
def find_remainder(previous_matches, remainder)
previous_matches.each_with_object([]) do |previous, results|
find_all_remaining_after_previous(remainder, previous, results)
end
end
def find_all_remaining_after_previous(remainder, previous, results)
with_cache(remainder, previous) do |_remainder, _previous|
find_all_from_pattern(nil, _remainder, (_previous+1))
end.each do |deeper|
results << ([previous] + deeper)
end
end
def reverse_index(haystack)
Hash.new{ |h,k| h[k] = [] }.tap do |ri|
haystack.each_char.with_index do |c, i|
ri[c] << i
end
end
end
MAX_REMAINDER_FOR_CACHE = 2
def with_cache(remainder, previous)
if remainder.size <= MAX_REMAINDER_FOR_CACHE
@cache[remainder.hash ^ previous.hash] ||= yield(remainder, previous)
else
yield(remainder, previous)
end
end
end
module Jason
extend self
def get_results(string, s_i, chars, c_i )
finds = []
while i = string.index(chars[c_i], s_i)
if chars[c_i + 1]
if more_finds = get_results(string, i + 1, chars , c_i + 1)
finds += more_finds.map{|a| a.unshift(i)}
else
return nil
end
else
finds << [i]
end
s_i = i + 1
end
return finds
end
def letters(a, b)
return get_results(a, 0, b.split(//), 0 )
end
end
haystack = "hello world" * 25
needle = "lold"
contendors = [Dave, Jason, Peter3, DominicLists]
compare(contendors.each_with_object({}) { |c, h| h[c] = -> { c.letters(haystack, needle) } })