Skip to content

Instantly share code, notes, and snippets.

@ramhiser
Last active November 4, 2021 08:41
Show Gist options
  • Save ramhiser/c990481c387058f3cce7 to your computer and use it in GitHub Desktop.
Save ramhiser/c990481c387058f3cce7 to your computer and use it in GitHub Desktop.
Jaccard cluster similarity in Python
import itertools
def jaccard(labels1, labels2):
"""
Computes the Jaccard similarity between two sets of clustering labels.
The value returned is between 0 and 1, inclusively. A value of 1 indicates
perfect agreement between two clustering algorithms, whereas a value of 0
indicates no agreement. For details on the Jaccard index, see:
http://en.wikipedia.org/wiki/Jaccard_index
Example:
labels1 = [1, 2, 2, 3]
labels2 = [3, 4, 4, 4]
print jaccard(labels1, labels2)
@param labels1 iterable of cluster labels
@param labels2 iterable of cluster labels
@return the Jaccard similarity value
"""
n11 = n10 = n01 = 0
n = len(labels1)
# TODO: Throw exception if len(labels1) != len(labels2)
for i, j in itertools.combinations(xrange(n), 2):
comembership1 = labels1[i] == labels1[j]
comembership2 = labels2[i] == labels2[j]
if comembership1 and comembership2:
n11 += 1
elif comembership1 and not comembership2:
n10 += 1
elif not comembership1 and comembership2:
n01 += 1
return float(n11) / (n11 + n10 + n01)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment