How to reproduce your benchmark

This document will explain the newly introduced files, how they are to be used and how to reproduce my benchmarks.

Additional dependencies:

Unfortunately, the current state of the code needs the additional dependency of pandas, a module for hadnling .csv, .tsv, etc. I was using it for grouping the datapoints by the document id. There are ways to do it without it and will be pushed soon.

So, you will have to install pandas first by running the command: pip install pandas

Current Directory Structure in gensim/similarity_learning

├── sl_vocab.py
├── drmm_tks.py
├── drmm_tks_example.py
├── dssm.py
├── dssm_example.py
├── evaluation_scripts
│   ├── evaluate_models.py
│   ├── mz_results
│   │   ├── predict.test.anmm.wikiqa.txt
│   │   ├── predict.test.arci.wikiqa.txt
│   │   ├── predict.test.cdssm.wikiqa.txt
│   │   ├── predict.test.conv_knrm_ranking.wikiqa.txt
│   │   ├── predict.test.drmm_tks.wikiqa.txt
│   │   ├── predict.test.drmm.wikiqa.txt
│   │   ├── predict.test.dssm.wikiqa.txt
│   │   ├── predict.test.duet.wikiqa.txt
│   │   ├── predict.test.knrm_ranking.wikiqa.txt
│   │   ├── predict.test.matchpyramid.wikiqa.txt
│   │   └── predict.test.mvlstm.wikiqa.txt
├── HowToReproduceMyBenchmark.md
├── README.md
└── data
    └── get_data.py

For reproducing benchmarks only, we can ignore everything except the contents of folders "evaluation_scripts" and "data"

Getting the data

Before we can run our evaluation script, we need to download the data set This can be done through the get_data.py script in gensim/similarity_learning/data/

It is a utility script to download and unzip the datsets for Similarity Learning It currently supports:

WikiQA
Quora Duplicate Question Pairs

To get wikiqa $ python get_data.py --datafile wikiqa

To get quoraqp $ python get_data.py --datafile quoraqp

Note:

You will also have to download the Stanford Glove embeddings from here. It will incorporated into the script soon. (TODO)
the evaluation scripts don't use QuoraQP yet

Running evaluations:

The script for running evaluations is evaluate_models.py which can be found in gensim/similarity_learning/evaluation_scripts/ This script should be run to get a model by model based or full evaluation.

When running benchmarks on MatchZoo, we need to enter the output files produced by MatchZoo when predicting on the test dataset. MatchZoo provides a file in the format predict.test.wikiqa.txt. In this case, I have collected my outputs and put them in gensim/similarity_learning/evaluation_scripts/mz_results/ and renamed to include the name of the model used to generate it. So, predict.test.wikiqa.txt becomes predict.test.model_name.wikiqa.txt

Unfortunately, you will have to run MatchZoo and get the outputs yourself. For now, you can trust the results I have uploaded.

The script has several parameters which are best understood through the --help

usage: evaluate_models.py [-h] [--model MODEL] [--datapath DATAPATH]
                          [--word_embedding_path WORD_EMBEDDING_PATH]
                          [--mz_result_file MZ_RESULT_FILE]
                          [--result_save_path RESULT_SAVE_PATH]
                          [--mz_result_folder MZ_RESULT_FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         runs the evaluation of doc2vec
  --datapath DATAPATH   path to the folder with WikiQACorpus. Path should
                        include WikiQACorpus Make sure you have run
                        get_data.py in gensim/similarity_learning/data/
  --word_embedding_path WORD_EMBEDDING_PATH
                        path to the Glove word embedding file
  --mz_result_file MZ_RESULT_FILE
                        path to the prediction output file made by mz
  --result_save_path RESULT_SAVE_PATH
                        path to save the results to as a csv
  --mz_result_folder MZ_RESULT_FOLDER
                        path to mz folder with many test prediction outputs

Example usage:

For evaluating doc2vec on the WikiQA corpus

$ python evaluate_models.py --model doc2vec --datapath ../data/WikiQACorpus/

For evaluating word2vec averaging on the WikiQA corpus

$ python evaluate_models.py --model word2vec --datapath ../data/WikiQACorpus/ --word_embedding_path ../evaluation_scripts/glove.6B.50d.txt

For evaluating the TREC format file produced by MatchZoo:

$ python evaluate_models.py --model mz --mz_result_file predict.test.wikiqa.txt

Note: here "predict.test.wikiqa.txt" is the file output by MZ. It has been provided in this repo as an example.

For evaluating all models

with one mz output file

$ python evaluate_models.py --model all --mz_result_file predict.test.wikiqa.txtDRMM --result_save_path results --word_embedding_path ../evaluation_scripts/glove.6B.50d.txt --datapath ../data/WikiQACorpus/

-with a mz folder filled with result files

$ python evaluate_models.py --model all --mz_result_folder mz_results/ --result_save_path results_all --datapath ../data/WikiQACorpus/ --word_embedding_path ../evaluation_scripts/glove.6B.50d.txt