Skip to content

Instantly share code, notes, and snippets.

@rajib
Created May 14, 2012 18:41
Show Gist options
  • Save rajib/2695599 to your computer and use it in GitHub Desktop.
Save rajib/2695599 to your computer and use it in GitHub Desktop.
takes two company names (strings) as an input and returns a similarity index on the scale of 0 to 100.
# http://www.catalysoft.com/articles/StrikeAMatch.html
class Similarity
def initialize(str1, str2)
@str1 = str1
@str2 = str2
@set1 = build_character_pair(str1)
@set2 = build_character_pair(str2)
end
def calculate
intersection = (@set1 & @set2)
collection = @set1 + @set2
result = ((2 * intersection.size).to_f / collection.size.to_f) * 100
return result.round
end
private
def build_character_pair(str)
charset = []
char_array = str.split('')
char_array.each_with_index do |v, i|
unless i == char_array.length-1
charset << "#{v}#{char_array[i+1]}"
end
end
return charset
end
end
Example:
similarity = Similarity.new("Tata consultancy services", "Tata motors")
similarity.calculate # return the calculated similarity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment