Information-Retrieval-Guide.md

Introduction

For some general background, read the Introduction of Büttcher et al's IR textbook: in particular, 1.1, 1.2, and 1.4.

1.4 Test Collection
- 1.4.1 TREC Tasks - TREC2 (Text REtrieval Conference), a series of experimental evaluation efforts conducted annually. TREC has included tracks devoted to enterprise search, genomic information retrieval, legal discovery, e-mail spam filtering, and blog search. Provides reusable test collections to validate the improvements.
IR application 1) Web Search, Desktop Search or Intranet Search, Site Search 2) Text Clustering & Categorization 3) Summarization 4) Text Extraction 5) Topic Detection 6) Expert Search Systems - identifies the members who are experts 7) Question & Answering 8 ) Multimedia ir - video, image, music, speech
IR System Architecture
Performance Evaluation
- Efficiency : 1) Latency 2) Throughput 3) Space
- Effectiveness : 1) Relevance

Aligning the Research and Practice of Building Search Applications: Elasticsearch and Pyserini - Jimmy Lin et al.
- MSMarco Document Ranking task using MRR as the metrics
- BM25 baseline
- ElasticSearch cross field search (using skopt & Bayesian optimization to select the optimal field boost values). Uses multi_match query type from Elasticsearch 7.10. The run uses a popular option called best fields, which scores each field through BM25 and takes the maximum as the overall query score.
- Added a separate field for doc2query–T5 expansions, bigram fields for each original field and per-field BM25 parameter tuning