Skip to content

Instantly share code, notes, and snippets.

@gavinmh
Created October 5, 2013 00:16
Show Gist options
  • Select an option

  • Save gavinmh/6834935 to your computer and use it in GitHub Desktop.

Select an option

Save gavinmh/6834935 to your computer and use it in GitHub Desktop.
Node-based Semantic Similarity Measures

Node-Based Semantic Similarity Measures

The following node-based semantic similarity measures are based on the information content of the lowest common subsumer of two words.

  • Information content is the probability of finding a term in a given corpus.
  • The lowest common subsumer is the most specific ancestor of two words in a lexical taxonomy such as WordNet. For example, the words "cat" and "dog" might have the common ancestors "animal" and "mammal". "Mammal" is their lowest common subsumer, because its distance to "cat" and "dog" is shorter. Note that if you conceive of WordNet as a tree, with "entity" as the root node, "lowest common subsumer" will be a misnomer; the lowest common subsumer is actually the ancestor that is farthest from the root.

These measures are dependent on the specific corpora used to generate the information content.

Resnik Similarity

Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node). Note that for any similarity measure that uses information content, the result is dependent on the corpus used to generate the information content and the specifics of how the information content was created.

Jiang-Conrath Similarity

Lin Similarity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment