Skip to content

Instantly share code, notes, and snippets.

@jaydonnell
Created December 4, 2009 21:50
Show Gist options
  • Save jaydonnell/249384 to your computer and use it in GitHub Desktop.
Save jaydonnell/249384 to your computer and use it in GitHub Desktop.
Yes sorry.
Basically what we are trying is to constraint the effect of the raw frequency (saturate the frequency).
In Lucene this is carried out with the root square of the frequency, another classical approach
is to use the log. With both approaches we avoid giving a linear 'importance' to the frequency.
BM25 is a bit tricky, it parametrises the 'saturation' of the frequency with a parameter k1, with the
equation weight(t)/(weight(t)+k1). Usually k1 is fixed to 2, but it can be fixed by collection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment