Skip to content

Instantly share code, notes, and snippets.

@TarasMartynyuk
Created April 26, 2018 21:56
Show Gist options
  • Save TarasMartynyuk/62bef7e2859eedbdf3af2657a04c4126 to your computer and use it in GitHub Desktop.
Save TarasMartynyuk/62bef7e2859eedbdf3af2657a04c4126 to your computer and use it in GitHub Desktop.
Precision "точність": What fraction of the returned results are relevant to the information need?
tp / len(chosen) = tp / (tp + fp)
Recall - "повнота": What fraction of the relevant documents in the collection were returned by the system?
== sensitivity:
tp / (tp + fn)
accuracy - "правильність":
(tp + tn) / (tp + tn + fp + fn)
f measure == balanced f measure: ?:
2pr/(p+r), є [0, 1]
rocchio:
user point of view:
query
user marks relevant documents
query is expanded (new words are added)
new results are sent
internally:
vector q is made distant from non-relevant docs and close to relevant docs
Pseudo relevance feedback:
assume top k retrieved documents are relevant
use rocchio
Indirect relevance feedback:
treat user clicks, etc as an evidence of relevance
champion lists:
r [top] docs to each term
>> so r is chosen in advance
r can be longer for rare terms
high and low lists:
disjoint sets
sorted by static quality
high: m docs with highest tf for t
0 1 2 3 4 5
0 10 110 1110 11110 111110 // unary code:
0 100 101 11000 11001 // gamma code
permuterm index:
hello
hello$
ello$h
llo$he
lo$hel
o$hell
$hello
he*o
$he*o
o$he*
SPIMI:
map for terms to termIDs
MapReduce:
master, parsers, inverter
master identifies idle machines and assigns role to them
???: каппа статистика:
документ Релев Нерелев
термін присутній Xt = 1 Pt Ut
турмін відсутній Xt = 0 1-Pt 1-Ut
Ціпф - приблизна к-сть термінів в колекції
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment