Skip to content

Instantly share code, notes, and snippets.

@lfoppiano
Last active March 23, 2021 06:27
Show Gist options
  • Save lfoppiano/ea4641c46a3d0628eb7b88fed54c99cb to your computer and use it in GitHub Desktop.
Save lfoppiano/ea4641c46a3d0628eb7b88fed54c99cb to your computer and use it in GitHub Desktop.
How to match element in two list of strings using a soft matching
from difflib import SequenceMatcher
def group_by_with_soft_matching(input_list, threshold):
matching = {}
last_matching = -1
input_list_sorted = sorted(list(set(input_list)), reverse=True)
for index_x, x in enumerate(input_list_sorted):
unpacked = [y for x in matching for y in matching[x]]
if x not in matching and x not in unpacked:
matching[x] = []
for index_y, y in enumerate(input_list_sorted[index_x + 1:]):
if x == y:
continue
if SequenceMatcher(None, x.lower(), y.lower()).ratio() > threshold:
matching[x].append(y)
else:
continue
return matching
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment