Skip to content

Instantly share code, notes, and snippets.

@dingsdax
Created January 9, 2011 19:42

Revisions

  1. dingsdax revised this gist Jul 13, 2011. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions hamming_distance.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,3 @@
    def hamming_distance(str1, str2)
    str1.split(//).zip(str2.split(//)).inject(0) { |h, e| e[0]==e[1] ? h+0 : h+1 }
    end
  2. dingsdax revised this gist Jul 13, 2011. 5 changed files with 46 additions and 8 deletions.
    36 changes: 36 additions & 0 deletions damerau_levenshtein.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,36 @@
    # extension for string class
    class String
    # return character array of string with indices
    def each_char_with_index
    i = 0
    split(//).each do |c|
    yield i, c
    i += 1
    end
    end
    end

    def damerau_levenshtein(str1, str2)
    d = Array.new(str1.size+1){Array.new(str2.size+1)}
    for i in (0..str1.size)
    d[i][0] = i
    end
    for j in (0..str2.size)
    d[0][j] = j
    end
    str1.each_char_with_index do |i,c1|
    str2.each_char_with_index do |j,c2|
    c = (c1 == c2 ? 0 : 1)
    d[i+1][j+1] = [
    d[i][j+1] + 1, #deletion
    d[i+1][j] + 1, #insertion
    d[i][j] + c].min #substitution
    if (i>0) and (j>0) and (str1[i]==str2[j-1]) and (str1[i-1]==str2[j])
    d[i+1][j+1] = [
    d[i+1][j+1],
    d[i-1][j-1] + c].min #transposition
    end
    end
    end
    d[str1.size][str2.size]
    end
    3 changes: 1 addition & 2 deletions dice_coefficient.rb
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,4 @@
    # http://staffwww.dcs.shef.ac.uk/people/S.Chapman/stringmetrics.html#dice

    # extension for string class
    class String
    # get ngrams of string
    def ngrams(len = 1)
    1 change: 0 additions & 1 deletion gistfile1.rb
    Original file line number Diff line number Diff line change
    @@ -1 +0,0 @@
    (m/2 - 1) where m = max(d,r)
    5 changes: 0 additions & 5 deletions hamming_distance.rb
    Original file line number Diff line number Diff line change
    @@ -1,5 +0,0 @@
    # http://staffwww.dcs.shef.ac.uk/people/S.Chapman/stringmetrics.html#hamming

    def hamming_distance(str1, str2)
    str1.split(//).zip(str2.split(//)).inject(0) { |h, e| e[0]==e[1] ? h+0 : h+1 }
    end
    9 changes: 9 additions & 0 deletions levenshtein.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,9 @@
    def levenshtein(str1, str2)
    return str1.length if 0 == str2.length
    return str2.length if 0 == str1.length
    c = Array.new
    c << (str1[0] == str2[0] ? 0 : 1) + (levenshtein str1[1..-1], str2[1..-1])
    c << 1 + levenshtein(str1[1..-1], str2)
    c << 1 + levenshtein(str1, str2[1..-1])
    return c.min
    end
  3. dingsdax renamed this gist Jul 13, 2011. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  4. dingsdax revised this gist Jul 13, 2011. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1 @@
    (m/2 - 1) where m = max(d,r)
  5. dingsdax revised this gist Jan 9, 2011. 1 changed file with 22 additions and 0 deletions.
    22 changes: 22 additions & 0 deletions dice_coefficient.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,22 @@
    # http://staffwww.dcs.shef.ac.uk/people/S.Chapman/stringmetrics.html#dice

    class String
    # get ngrams of string
    def ngrams(len = 1)
    ngrams = []
    len = size if len > size
    (0..size - len).each do |n|
    ng = self[n...(n + len)]
    ngrams.push(ng)
    end
    ngrams
    end
    end

    def dice_coefficient(str1, str2)
    str1_2grams = str1.ngrams(2)
    str2_2grams = str2.ngrams(2)
    intersection = (str1_2grams & str2_2grams).length
    total = str1_2grams.length + str2_2grams.length
    dice = 2.0 * intersection / total
    end
  6. dingsdax revised this gist Jan 9, 2011. 1 changed file with 0 additions and 3 deletions.
    3 changes: 0 additions & 3 deletions string_similarity_metrics.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +0,0 @@
    source: http://staffwww.dcs.shef.ac.uk/people/S.Chapman/stringmetrics.html
    currently implemented:
    * hamming distance.
  7. dingsdax created this gist Jan 9, 2011.
    5 changes: 5 additions & 0 deletions hamming_distance.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,5 @@
    # http://staffwww.dcs.shef.ac.uk/people/S.Chapman/stringmetrics.html#hamming

    def hamming_distance(str1, str2)
    str1.split(//).zip(str2.split(//)).inject(0) { |h, e| e[0]==e[1] ? h+0 : h+1 }
    end
    3 changes: 3 additions & 0 deletions string_similarity_metrics.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,3 @@
    source: http://staffwww.dcs.shef.ac.uk/people/S.Chapman/stringmetrics.html
    currently implemented:
    * hamming distance.