Skip to content

Instantly share code, notes, and snippets.

@laurentpetit
Created February 15, 2011 22:37
Show Gist options
  • Save laurentpetit/828413 to your computer and use it in GitHub Desktop.
Save laurentpetit/828413 to your computer and use it in GitHub Desktop.
Functional Levenshtein distance
;;; The following version works with any seq-able (not only Strings), but hardwires
;;; function = for equality testing of seq values (rather good default IMHO), but also
;;; hardwires the cost of 1 for either element insertion, deletion, or swap.
;;;
;;; It is functional, it does not use arrays, nor a MxN matrix, just the required data
;;; for computing a given "row" of the "virtual matrix" (e.g. the previous row)
;;;
;;; I'm quite sure it can still be improved for better readability and performance
;;; without loosing any of the above mentioned characteristics.
(defn- new-row [prev-row row-elem t]
(reduce
(fn [row [d-1 d e]] (conj row (if (= row-elem e) d-1 (inc (min (peek row) d d-1)))))
[(inc (first prev-row))]
(map vector prev-row (next prev-row) t)))
(defn levenshtein [s t]
(peek (reduce
(fn [prev-row s-elem] (new-row prev-row s-elem t))
(range (inc (count t)))
s)))
; (levenshtein "avada kedavra" "abracadabra")
; => 7
; (time (levenshtein (apply str (repeat 1000 \a)) (apply str (apply str (repeat 100 \b)) (repeat 900 \a))))
; => "Elapsed time: 1036.099432 msecs"
; 100
; (time (levenshtein "abcdefghi" "hijklmnop"))
; => "Elapsed time: 0.53652 msecs"
; 9
; (levenshtein "sitting" "kitten")
; => 3
; (levenshtein "sunday" "saturday")
; => 3
; (levenshtein [1 2 3 4] [1 2 4 4])
; => 1
@cgrand
Copy link

cgrand commented Feb 15, 2011

I think you can write a very cool levenshtein if you lean your head 45° to the left while looking at the matrix.

@gdevanla
Copy link

Just wanted to leave a memoized version.

(defn edit-distance-mem [a b]
  (let [f (fn [fm s1 s2]
            (let [cost? (fn [p q]
                          (if (= p q) 0 1))]
              (cond
               (and (= 0 (count s1)) (= 0 (count s2))) 0
               (zero? (count s1)) (count s2)
               (zero? (count s2)) (count s1)
               :else (min (inc (fm fm s1 (rest s2)))
                          (inc (fm fm (rest s1) s2))
                          (+ (cost? (first s1) (first s2))
                             (fm fm (rest s1) (rest s2)))
                             ))))]
    (f (memoize f) a b)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment