Created
December 4, 2009 21:50
-
-
Save jaydonnell/249384 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Yes sorry. | |
Basically what we are trying is to constraint the effect of the raw frequency (saturate the frequency). | |
In Lucene this is carried out with the root square of the frequency, another classical approach | |
is to use the log. With both approaches we avoid giving a linear 'importance' to the frequency. | |
BM25 is a bit tricky, it parametrises the 'saturation' of the frequency with a parameter k1, with the | |
equation weight(t)/(weight(t)+k1). Usually k1 is fixed to 2, but it can be fixed by collection. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment