Created
June 19, 2017 20:48
-
-
Save amraks/769e6be3806b27e0ff6d590f23e9e30f to your computer and use it in GitHub Desktop.
edit distance
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MAX = 999999999 | |
def edit_distance(s1, s2): | |
d = [] | |
for i in range(len(s1) + 1): | |
d.append([i]) | |
# here d --> [[0], [1], [2], [3]] | |
for j in range(1, len(s2) + 1): | |
d[0].append(j) | |
# here d --> [[0, 1, 2, 3], [1], [2], [3]] | |
# Lets call this point INIT (shown below in Explanation) | |
for i in range(1, len(s1) + 1): | |
for j in range(1, len(s2) + 1): | |
d[i].append(MAX) | |
print d | |
for i in range(1, len(s1) + 1): | |
for j in range(1, len(s2) + 1): | |
tmp = d[i-1][j-1] if s1[i-1] == s2[j-1] else d[i-1][j-1] + 1 # case 3 | |
d[i][j] = min(min(d[i-1][j] + 1, d[i][j-1] + 1), tmp) | |
return d[len(s1)][len(s2)] | |
s1 = 'xap' | |
s2 = 'kapb' | |
print '%s -> %s, distance=%s' % (s1, s2, edit_distance(s1, s2)) | |
""" | |
Explanation: | |
Consider eg. kts to ars. | |
INIT | |
s1-- 0 a r s | |
s2 | |
| | |
0 0 1 2 3 | |
k 1 | |
t 2 | |
s 3 | |
Think in terms of last operation. | |
Case 1: | |
Suppose you converted kt to ars. Now you need to delete s from kts. | |
So, C1 = E(kt, ars) + 1 (delete s from kts) | |
Case 2: | |
Suppose you converted kts to ar. Now you have to add s to ar. | |
So, C2 = E(kts, ar) + 1 (add s to ar) | |
Case 3: | |
Suppose you converted kt to ar. Now you have two choices, | |
if last chars are equal (as in this case): | |
C3 = E(kt, ar) + 0 (no cost) | |
else: | |
C3 = E(kt, ar) + 1 (convert last char from first string to whatever last char of second string) | |
E(kts, ars) = min(C1, C2, C3) | |
C1 = E(i - 1, j) + 1 | |
C2 = E(i, j - 1) + 1 | |
C3 = E(i - 1, j - 1) + (0 if s1[i] == s2[j] else 1) | |
PS: You have to preserve destination string. So you will always perform operation on SOURCE string, | |
1. add char to SOURCE string | |
2. delete char from SOURCE string | |
3. substitue char from SOURCE string | |
""" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment