Skip to content

Instantly share code, notes, and snippets.

@berdosi
Last active May 23, 2018 15:52
Show Gist options
  • Save berdosi/a27f980b24261bf34e43c30948cd36e4 to your computer and use it in GitHub Desktop.
Save berdosi/a27f980b24261bf34e43c30948cd36e4 to your computer and use it in GitHub Desktop.
Calculate the Jaccard similarity index between two strings. Strings are treated as sets of words, and duplicate words are removed. (https://en.wikipedia.org/wiki/Jaccard_index)
function getJaccardSimilarity(item, otherItem) {
function makeUnique(prev, current, index) {
return (index === 1
? [prev].concat(prev !== current ? current : []) // handle when first two items are identical
: (prev.indexOf(current) > -1
? prev
: (prev.push(current), prev)))
}
const union = [].concat(item.split(/\s+/)).concat(otherItem.split(/\s+/)).reduce(makeUnique);
const otherUnique = otherItem.split(/\s+/).reduce(makeUnique);
const intersection = item.split(/\s+/).reduce(makeUnique).filter((word) => otherUnique.indexOf(word) > -1);
return intersection.length / union.length;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment