Skip to content

Instantly share code, notes, and snippets.

@dwsmart
Last active September 11, 2023 17:10
Show Gist options
  • Save dwsmart/cdd6a50c1790be9ae1950a6fb73471ec to your computer and use it in GitHub Desktop.
Save dwsmart/cdd6a50c1790be9ae1950a6fb73471ec to your computer and use it in GitHub Desktop.
def Jaccard_Similarity(doc1, doc2):
# List the unique words in a document
words_doc1 = set(doc1.lower().split())
words_doc2 = set(doc2.lower().split())
# Find the intersection of words list of doc1 & doc2
intersection = words_doc1.intersection(words_doc2)
# Find the union of words list of doc1 & doc2
union = words_doc1.union(words_doc2)
# Calculate Jaccard similarity score
# using length of intersection set divided by length of union set
return float(len(intersection)) / len(union)
doc_1 = ''' page one text here'''
doc_2 = ''' page two text here'''
sim = Jaccard_Similarity(doc_1,doc_2)
print("Jaccard similarity between doc_1 & doc_2 is", sim)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment