Created
July 22, 2024 12:31
-
-
Save 2af/3360a67385141d0a2f139a2e3df3ef7b to your computer and use it in GitHub Desktop.
Levenshtein (editor) distance weighted.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import Levenshtein | |
def levenshtein(s1: str, s2: str) -> float: | |
''' | |
0.0 means perfect match | |
0.1 very possible match | |
0.2 less possible match | |
0.3 likely not a match | |
''' | |
def processor(_string: str) -> str: | |
_string = _string.upper() | |
_string = _string.replace('LIMITED', 'LTD') | |
return _string | |
distance = Levenshtein.distance(s1, s2, processor=standardize_names) | |
score = distance / max([len(s1), len(s2)]) | |
return round(score, 2) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment