Skip to content

Instantly share code, notes, and snippets.

@kawa-kokosowa
Last active November 25, 2017 23:42
Show Gist options
  • Save kawa-kokosowa/ab246eb1a6d246997ca4fa8f2d2226a6 to your computer and use it in GitHub Desktop.
Save kawa-kokosowa/ab246eb1a6d246997ca4fa8f2d2226a6 to your computer and use it in GitHub Desktop.
Jaccard Similarity Function in Python 3 (builtins only)
"""Jaccard similarity of two sets using builtin Python 3 only."""
import doctest
def jaccard_similarity(x: set, y: set) -> float:
"""Get the Jaccard similarity of two sets.
Example:
>>> jaccard_similarity({1,2,3,4}, {2,3,5,7})
0.3333333333333333
>>> jaccard_similarity({1,2,3,4}, {2,4,6})
0.4
>>> jaccard_similarity({2,3,5,7}, {2,4,6})
0.16666666666666666
"""
intersection_cardinality = len(x.intersection(y))
union_cardinality = len(x) + len(y) - intersection_cardinality
return intersection_cardinality / union_cardinality
if __name__ == "__main__":
doctest.testmod()
@kawa-kokosowa
Copy link
Author

Line #20 is tricky. The formula to get the union_cardinality, while counter-intuitive, is precisely cardinality (x union y) - cardinality (x intersect y). This is because a union produces a set (whose members are unique) and thus by subtracting the intersection set (whose elements are also unique) you subtracked the duplicates from the union set so that one may finally correctly obtain the union_cardinality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment