Skip to content

Instantly share code, notes, and snippets.

@MichaelAstreiko
Created September 6, 2012 14:25
Show Gist options
  • Save MichaelAstreiko/3656824 to your computer and use it in GitHub Desktop.
Save MichaelAstreiko/3656824 to your computer and use it in GitHub Desktop.
Find Algorithm for similarities
/**
* Made test of three word comparison algorithms:
* JaroWinkler, SmithWatermanGotoh and Soundex
*
* Based on test result Soundex is chosen to be used in venue title merging algorithm
* It is more usable, so that I can say if similarity > 0.98 then is same word
*/
void testMergeAlgorithms() {
def cafeName1 = "Antonio Cafe"
def cafeName2 = "Antonio's Cafe"
def cafeName3 = "Antonio Kafe"
def cafeName4 = "Antonio hotel"
def cafeName5 = "Mary Coffee"
def algorithm = new JaroWinkler()
assertTrue algorithm.getSimilarity(cafeName1, cafeName2) == 0.9809524f
assertTrue algorithm.getSimilarity(cafeName1, cafeName3) == 0.9777778f
assertTrue algorithm.getSimilarity(cafeName3, cafeName2) == 0.96031743f
assertTrue algorithm.getSimilarity(cafeName1, cafeName4) == 0.92564106f
assertTrue algorithm.getSimilarity(cafeName1, cafeName5) == 0.55707073f
algorithm = new SmithWatermanGotoh()
assertTrue algorithm.getSimilarity(cafeName1, cafeName2) == 0.9f
assertTrue algorithm.getSimilarity(cafeName1, cafeName3) == 0.8666667f
assertTrue algorithm.getSimilarity(cafeName3, cafeName2) == 0.76666665f
assertTrue algorithm.getSimilarity(cafeName1, cafeName4) == 0.7f
assertTrue algorithm.getSimilarity(cafeName1, cafeName5) == 0.3272727f
algorithm = new Soundex()
assertTrue algorithm.getSimilarity(cafeName1, cafeName2) == 1.0f
assertTrue algorithm.getSimilarity(cafeName1, cafeName3) == 1.0f
assertTrue algorithm.getSimilarity(cafeName3, cafeName2) == 1.0f
assertTrue algorithm.getSimilarity(cafeName1, cafeName4) == 0.9444444f
assertTrue algorithm.getSimilarity(cafeName1, cafeName5) == 0.5555556f
assertTrue algorithm.getSimilarity(cafeName2, cafeName5) == 0.5555556f
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment