Skip to content

Instantly share code, notes, and snippets.

@ddbs
Created January 7, 2023 23:21
Show Gist options
  • Save ddbs/fa79e4e2f6d6ee324b612e7e7729dc98 to your computer and use it in GitHub Desktop.
Save ddbs/fa79e4e2f6d6ee324b612e7e7729dc98 to your computer and use it in GitHub Desktop.
from difflib import SequenceMatcher
def get_similar_records(df, column, input_string, num_records):
# Create an empty list to store the similar records
similar_records = []
# Loop through the records in the column
for record in df[column]:
# Calculate the similarity between the input string and the record
similarity = SequenceMatcher(None, input_string, record).ratio()
# Add the record and its similarity to the list
similar_records.append((record, similarity))
# Sort the list of records by similarity in descending order
similar_records.sort(key=lambda x: x[1], reverse=True)
# Return the top num_records records
return similar_records[:num_records]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment