Created
February 15, 2011 22:37
-
-
Save laurentpetit/828413 to your computer and use it in GitHub Desktop.
Functional Levenshtein distance
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;;; The following version works with any seq-able (not only Strings), but hardwires | |
;;; function = for equality testing of seq values (rather good default IMHO), but also | |
;;; hardwires the cost of 1 for either element insertion, deletion, or swap. | |
;;; | |
;;; It is functional, it does not use arrays, nor a MxN matrix, just the required data | |
;;; for computing a given "row" of the "virtual matrix" (e.g. the previous row) | |
;;; | |
;;; I'm quite sure it can still be improved for better readability and performance | |
;;; without loosing any of the above mentioned characteristics. | |
(defn- new-row [prev-row row-elem t] | |
(reduce | |
(fn [row [d-1 d e]] (conj row (if (= row-elem e) d-1 (inc (min (peek row) d d-1))))) | |
[(inc (first prev-row))] | |
(map vector prev-row (next prev-row) t))) | |
(defn levenshtein [s t] | |
(peek (reduce | |
(fn [prev-row s-elem] (new-row prev-row s-elem t)) | |
(range (inc (count t))) | |
s))) | |
; (levenshtein "avada kedavra" "abracadabra") | |
; => 7 | |
; (time (levenshtein (apply str (repeat 1000 \a)) (apply str (apply str (repeat 100 \b)) (repeat 900 \a)))) | |
; => "Elapsed time: 1036.099432 msecs" | |
; 100 | |
; (time (levenshtein "abcdefghi" "hijklmnop")) | |
; => "Elapsed time: 0.53652 msecs" | |
; 9 | |
; (levenshtein "sitting" "kitten") | |
; => 3 | |
; (levenshtein "sunday" "saturday") | |
; => 3 | |
; (levenshtein [1 2 3 4] [1 2 4 4]) | |
; => 1 |
Just wanted to leave a memoized version.
(defn edit-distance-mem [a b]
(let [f (fn [fm s1 s2]
(let [cost? (fn [p q]
(if (= p q) 0 1))]
(cond
(and (= 0 (count s1)) (= 0 (count s2))) 0
(zero? (count s1)) (count s2)
(zero? (count s2)) (count s1)
:else (min (inc (fm fm s1 (rest s2)))
(inc (fm fm (rest s1) s2))
(+ (cost? (first s1) (first s2))
(fm fm (rest s1) (rest s2)))
))))]
(f (memoize f) a b)))
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think you can write a very cool levenshtein if you lean your head 45° to the left while looking at the matrix.