For some general background, read the Introduction of Büttcher et al's IR textbook: in particular, 1.1, 1.2, and 1.4.
- 1.4 Test Collection
- 1.4.1 TREC Tasks - TREC2 (Text REtrieval Conference), a series of experimental evaluation efforts conducted annually. TREC has included tracks devoted to enterprise search, genomic information retrieval, legal discovery, e-mail spam filtering, and blog search. Provides reusable test collections to validate the improvements.
- IR application 1) Web Search, Desktop Search or Intranet Search, Site Search 2) Text Clustering & Categorization 3) Summarization 4) Text Extraction 5) Topic Detection 6) Expert Search Systems - identifies the members who are experts 7) Question & Answering 8 ) Multimedia ir - video, image, music, speech
- IR System Architecture
- Performance Evaluation
- Efficiency : 1) Latency 2) Throughput 3) Space
- Effectiveness : 1) Relevance
- Aligning the Research and Practice of Building
Search Applications: Elasticsearch and Pyserini - Jimmy Lin et al.
- MSMarco Document Ranking task using MRR as the metrics
- BM25 baseline
- ElasticSearch cross field search (using skopt & Bayesian optimization to select the optimal field boost values). Uses multi_match query type from Elasticsearch 7.10. The run uses a popular option called best fields, which scores each field through BM25 and takes the maximum as the overall query score.
- Added a separate field for doc2query–T5 expansions, bigram fields for each original field and per-field BM25 parameter tuning