Created
November 30, 2013 15:01
-
-
Save MichaelAquilina/7720122 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//calculate tfidf value with given input parameters | |
private double TFIDF(int TermFrequency, int DocumentFrequency, int NormalizationValue, int NumberOfDocuments) | |
{ | |
//it is important to specify that a divide is a double otherwise the compiler will assume its an integer since the paramaters are integers | |
return ((double)TermFrequency / NormalizationValue) * Math.Log(((double)NumberOfDocuments / DocumentFrequency), 2); | |
} |
Definitions from the equation above:
- TermFrequency = frequency of a term in some document
- NormalizationValue = usually the length of the document
- NumberOfDocuments = number of documents in the corpus
- DocumentFrequency = frequency of the term in the entire corpus
Should be easy to convert to Java since its C# syntax
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Term Frequency, Inverse Document Frequency Equation