Skip to content

Instantly share code, notes, and snippets.

@FanchenBao
Created March 26, 2020 17:27
Show Gist options
  • Save FanchenBao/8c0a353194d4bbe28d226127da284273 to your computer and use it in GitHub Desktop.
Save FanchenBao/8c0a353194d4bbe28d226127da284273 to your computer and use it in GitHub Desktop.
Levenshtein word distance algorithm
def levenshtein(token1, token2):
mtx = [[0] * (len(token2) + 1) for _ in range(len(token1) + 1)]
# fill the first column and row
for j in range(len(token2) + 1):
mtx[0][j] = j
for i in range(len(token1) + 1):
mtx[i][0] = i
# DP
for i, t1 in enumerate(token1):
for j, t2 in enumerate(token2):
if t1 == t2: # most trivial case
mtx[i + 1][j + 1] = mtx[i][j]
else:
mtx[i + 1][j + 1] = min(
mtx[i][j + 1], # delete token1 letter at position i
mtx[i + 1][j], # delete token2 letter at position j
mtx[i][j], # replacement
) + 1
# Note that insertion is not considered,
# because it is equivalent to deletion.
return mtx[len(token1)][len(token2)]
print(levenshtein('helo', 'hello')) # output 1
print(levenshtein('abcde', 'abbdeg')) # output 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment