Last active
April 20, 2024 01:38
-
-
Save JDWarner/6730886 to your computer and use it in GitHub Desktop.
Jaccard coefficient between two boolean NumPy arrays or array-like data. This is commonly used as a set similarity metric, and it is a true metric. The dimensionality of the input is completely arbitrary, but `im1.shape` and `im2.shape` much be equal. This Gist is licensed under the modified BSD license, otherwise known as the 3-clause BSD.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
_jaccard.py : Jaccard metric for comparing set similarity. | |
""" | |
import numpy as np | |
def jaccard(im1, im2): | |
""" | |
Computes the Jaccard metric, a measure of set similarity. | |
Parameters | |
---------- | |
im1 : array-like, bool | |
Any array of arbitrary size. If not boolean, will be converted. | |
im2 : array-like, bool | |
Any other array of identical size. If not boolean, will be converted. | |
Returns | |
------- | |
jaccard : float | |
Jaccard metric returned is a float on range [0,1]. | |
Maximum similarity = 1 | |
No similarity = 0 | |
Notes | |
----- | |
The order of inputs for `jaccard` is irrelevant. The result will be | |
identical if `im1` and `im2` are switched. | |
""" | |
im1 = np.asarray(im1).astype(np.bool) | |
im2 = np.asarray(im2).astype(np.bool) | |
if im1.shape != im2.shape: | |
raise ValueError("Shape mismatch: im1 and im2 must have the same shape.") | |
intersection = np.logical_and(im1, im2) | |
union = np.logical_or(im1, im2) | |
return intersection.sum() / float(union.sum()) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment