Skip to content

Instantly share code, notes, and snippets.

@manisnesan
Last active January 17, 2022 16:52
Show Gist options
  • Save manisnesan/a8a502a9438cc89f0f7a0fe8a8f89588 to your computer and use it in GitHub Desktop.
Save manisnesan/a8a502a9438cc89f0f7a0fe8a8f89588 to your computer and use it in GitHub Desktop.

Introduction

For some general background, read the Introduction of Büttcher et al's IR textbook: in particular, 1.1, 1.2, and 1.4.

  • 1.4 Test Collection
    • 1.4.1 TREC Tasks - TREC2 (Text REtrieval Conference), a series of experimental evaluation efforts conducted annually. TREC has included tracks devoted to enterprise search, genomic information retrieval, legal discovery, e-mail spam filtering, and blog search. Provides reusable test collections to validate the improvements.
  • IR application 1) Web Search, Desktop Search or Intranet Search, Site Search 2) Text Clustering & Categorization 3) Summarization 4) Text Extraction 5) Topic Detection 6) Expert Search Systems - identifies the members who are experts 7) Question & Answering 8 ) Multimedia ir - video, image, music, speech
  • IR System Architecture
  • Performance Evaluation
    • Efficiency : 1) Latency 2) Throughput 3) Space
    • Effectiveness : 1) Relevance

Relevance Tuning

Papers

  • Aligning the Research and Practice of Building Search Applications: Elasticsearch and Pyserini - Jimmy Lin et al.
    • MSMarco Document Ranking task using MRR as the metrics
    • BM25 baseline
    • ElasticSearch cross field search (using skopt & Bayesian optimization to select the optimal field boost values). Uses multi_match query type from Elasticsearch 7.10. The run uses a popular option called best fields, which scores each field through BM25 and takes the maximum as the overall query score.
    • Added a separate field for doc2query–T5 expansions, bigram fields for each original field and per-field BM25 parameter tuning

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment