Created
January 9, 2011 19:42
Revisions
-
dingsdax revised this gist
Jul 13, 2011 . 1 changed file with 3 additions and 0 deletions.There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,3 @@ def hamming_distance(str1, str2) str1.split(//).zip(str2.split(//)).inject(0) { |h, e| e[0]==e[1] ? h+0 : h+1 } end -
dingsdax revised this gist
Jul 13, 2011 . 5 changed files with 46 additions and 8 deletions.There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,36 @@ # extension for string class class String # return character array of string with indices def each_char_with_index i = 0 split(//).each do |c| yield i, c i += 1 end end end def damerau_levenshtein(str1, str2) d = Array.new(str1.size+1){Array.new(str2.size+1)} for i in (0..str1.size) d[i][0] = i end for j in (0..str2.size) d[0][j] = j end str1.each_char_with_index do |i,c1| str2.each_char_with_index do |j,c2| c = (c1 == c2 ? 0 : 1) d[i+1][j+1] = [ d[i][j+1] + 1, #deletion d[i+1][j] + 1, #insertion d[i][j] + c].min #substitution if (i>0) and (j>0) and (str1[i]==str2[j-1]) and (str1[i-1]==str2[j]) d[i+1][j+1] = [ d[i+1][j+1], d[i-1][j-1] + c].min #transposition end end end d[str1.size][str2.size] end This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,4 @@ # extension for string class class String # get ngrams of string def ngrams(len = 1) This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1 +0,0 @@ This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +0,0 @@ This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,9 @@ def levenshtein(str1, str2) return str1.length if 0 == str2.length return str2.length if 0 == str1.length c = Array.new c << (str1[0] == str2[0] ? 0 : 1) + (levenshtein str1[1..-1], str2[1..-1]) c << 1 + levenshtein(str1[1..-1], str2) c << 1 + levenshtein(str1, str2[1..-1]) return c.min end -
dingsdax renamed this gist
Jul 13, 2011 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
dingsdax revised this gist
Jul 13, 2011 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1 @@ (m/2 - 1) where m = max(d,r) -
dingsdax revised this gist
Jan 9, 2011 . 1 changed file with 22 additions and 0 deletions.There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,22 @@ # http://staffwww.dcs.shef.ac.uk/people/S.Chapman/stringmetrics.html#dice class String # get ngrams of string def ngrams(len = 1) ngrams = [] len = size if len > size (0..size - len).each do |n| ng = self[n...(n + len)] ngrams.push(ng) end ngrams end end def dice_coefficient(str1, str2) str1_2grams = str1.ngrams(2) str2_2grams = str2.ngrams(2) intersection = (str1_2grams & str2_2grams).length total = str1_2grams.length + str2_2grams.length dice = 2.0 * intersection / total end -
dingsdax revised this gist
Jan 9, 2011 . 1 changed file with 0 additions and 3 deletions.There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +0,0 @@ -
dingsdax created this gist
Jan 9, 2011 .There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,5 @@ # http://staffwww.dcs.shef.ac.uk/people/S.Chapman/stringmetrics.html#hamming def hamming_distance(str1, str2) str1.split(//).zip(str2.split(//)).inject(0) { |h, e| e[0]==e[1] ? h+0 : h+1 } end This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,3 @@ source: http://staffwww.dcs.shef.ac.uk/people/S.Chapman/stringmetrics.html currently implemented: * hamming distance.