Skip to content

Instantly share code, notes, and snippets.

@PhyrexTsai
Last active April 23, 2016 17:59
Show Gist options
  • Save PhyrexTsai/34f789be238a1f9a6395ca0bb695d341 to your computer and use it in GitHub Desktop.
Save PhyrexTsai/34f789be238a1f9a6395ca0bb695d341 to your computer and use it in GitHub Desktop.
String editing distance.md

String editing distance

Motivation: compute the minimal number of changes needed to transform one string into another. Goal: find “cheapest” sequence of editing steps using operations

  • Change a character
  • Copy a character without change
  • Delete a character
  • Insert a character
  • Kill rest of string (delete to end)
data Edit = Change Char | Copy | Delete | Insert Char 
                                         deriving (Eq, Show)

transform:: String -> String -> [ Edit ]

Example: “Help” is not far from “Hello”

>transform "help" "hello"
   [Copy, Copy, Copy, Insert ’l’, Change ’o’]
>transform "hello" "help"
   [Copy, Copy, Copy, Delete, Change ’p’]

About the cost of an edit operation: Unit price for all operations except copy, which is equivalent to “no change” (i.e., 0 cost).

   cost :: [Edit] -> Int
   cost = length . filter (/=Copy)

transform [] [] = []
transform st [] = replicate (length st) Delete
transform [] st = map Insert st
transform (a:x) (b:y)
    | a==b = Copy : transform x y
    | otherwise = best [ Delete   : ??? ,
                         Insert b : ??? ,
                         Change b : ??? ]
best :: [[Edit]] -> [Edit]
best [x] = x
best (x:xs)
  | cost x <= cost b = x
  | otherwise = b
       where
             b = best xs

More examples:

> transform “abcde” “bbc”
[Change ‘b’, Copy, Copy, Delete, Delete]

> transform "fish" "chips"
[Insert ’c’, Change ’h’, Copy, Insert ’p’, Copy, Delete]

> transform "1234" "4321"
[Delete, Change ’4’, Copy, Insert ’2’, Change ’1’]

> transform "123456" "654321"
[Delete, Change ’6’, Change ’5’, Copy, Insert ’3’, Change ’2’,
Change ’1’]

> transform "12345678" "87654321"
[Delete, Change ’8’, Change ’7’, Change ’6’, Copy, Insert ’4’,
Change ’3’, Change ’2’, Change ’1’]
data Edit = Change Char | Copy | Delete | Insert Char deriving (Eq, Show)
transform :: String -> String -> [Edit]
transform [] [] = []
transform st [] = replicate (length st) Delete
transform [] st = map Insert st
transform (a:x) (b:y)
| a==b = Copy : transform x y
| otherwise = best [Delete : transform x (b:y),
Insert b : transform (a:x) y,
Change b : transform x y]
cost :: [Edit] -> Int
cost = length . filter (/=Copy)
best :: [[Edit]] -> [Edit]
best [x] = x
best (x:xs)
| cost x <= cost b = x
| otherwise = b
where b = best xs
main = do
print $ transform "abcde" "bbc" -- [Change 'b',Copy,Copy,Delete,Delete]
print $ transform "fish" "chips" -- [Insert 'c',Change 'h',Copy,Insert 'p',Copy,Delete]
print $ transform "1234" "4321" -- [Delete,Change '4',Copy,Insert '2',Change '1']
print $ transform "123456" "654321" -- [Delete,Change '6',Change '5',Copy,Insert '3',Change '2',Change '1']
print $ transform "12345678" "87654321" -- [Delete,Change '8',Change '7',Change '6',Copy,Insert '4',Change '3',Change '2',Change '1']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment