Skip to content

Instantly share code, notes, and snippets.

@remvee
Created November 2, 2015 09:42
Show Gist options
  • Save remvee/ba61a34561490f46e5dd to your computer and use it in GitHub Desktop.
Save remvee/ba61a34561490f46e5dd to your computer and use it in GitHub Desktop.
(ns remvee.metric.levenshtein
(:use [clojure.test]))
;; From: http://en.wikipedia.org/wiki/Levenshtein_distance
;;
;; int LevenshteinDistance(char s[1..m], char t[1..n])
;; {
;; // for all i and j, d[i,j] will hold the Levenshtein distance between
;; // the first i characters of s and the first j characters of t;
;; // note that d has (m+1)x(n+1) values
;; declare int d[0..m, 0..n]
;;
;; for i from 0 to m
;; d[i, 0] := i // the distance of any first string to an empty second string
;; for j from 0 to n
;; d[0, j] := j // the distance of any second string to an empty first string
;;
;; for j from 1 to n
;; {
;; for i from 1 to m
;; {
;; if s[i] = t[j] then
;; d[i, j] := d[i-1, j-1] // no operation required
;; else
;; d[i, j] := minimum
;; (
;; d[i-1, j] + 1, // a deletion
;; d[i, j-1] + 1, // an insertion
;; d[i-1, j-1] + 1 // a substitution
;; )
;; }
;; }
;;
;; return d[m,n]
;; }
(defn levenshtein
"Calculate levenshtein distance between two sequences."
{:test #(are [d s1 s2] (= d (levenshtein s1 s2))
0 "soup" "soup"
3 "kitten" "sitting"
3 "sitting" "kitten"
0 [1 2 3] [1 2 3]
1 [1 2 3] [1 2 3 4])}
[s1 s2]
(((reduce (fn [d [i j]]
(assoc d (inc j)
(assoc (d (inc j)) (inc i)
(first
(sort
(list (+ (if (= (nth s1 i)
(nth s2 j)) 0 1) ((d j) i))
(inc ((d (inc j)) i))
(inc ((d j) (inc i)))))))))
(merge {0 (let [s (range (inc (count s1)))]
(zipmap s s))}
(let [s (range 1 (inc (count s2)))]
(zipmap s (map (fn [i] {0 i}) s))))
(for [i (range 0 (count s1)) j (range 0 (count s2))] [i j]))
(count s2))
(count s1)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment