Created
February 7, 2018 21:00
-
-
Save pete-rai/c0c0a462f2f3edf144c9a42f58164d2c to your computer and use it in GitHub Desktop.
Log-likelihood is a statistical technique that helps identify significant words in a given body of text when compared with a wider corpus. More information at: https://github.com/pete-rai/words-of-our-culture#log-likelihood
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
// for more info see : http://ucrel.lancs.ac.uk/llwizard.html | |
// $n1 = total words in corpus 1 (usually the normative corpus) | |
// $n2 = total words in corpus 2 | |
// $o1 = observed count for the word in corpus 1 (usually the normative corpus) | |
// $o2 = observed count for the word in corpus 2 | |
function logLikelihood ($n1, $o1, $n2, $o2) | |
{ | |
$ll = 0; | |
if ($o1 && $o2) | |
{ | |
// calculate expected values | |
$e1 = $n1 * ($o1 + $o2) / ($n1 + $n2); // expected counts in corpus 1 | |
$e2 = $n2 * ($o1 + $o2) / ($n1 + $n2); // expected counts in corpus 2 | |
// calculate log likelihood | |
$ll = (2 * (($o1 * log ($o1 / $e1)) + ($o2 * log ($o2 / $e2)))); | |
} | |
return $ll; | |
} | |
?> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You can see Log-likelihood in action in my project Words of our Culture. Click here for a demo.