First, it is you need to create a conda or venv environment. This mamba.yml file contains the packages:
name: QAFactEval
channels:
- conda-forge
dependencies:
- python=3.8.18=hd12c33a_0_cpython
- spacy=2.2.4
- spacy-model-en_core_web_sm=2.2.5
- gdown=4.7.1
- pysocks=1.7.1
- pip:
- qafacteval==0.10
Micromamba is a fast and self-contained C++ conda runtime. You can create the environment and setup QAFactEval with these commands:
micromamba env create -f conda.yml -y
micromamba activate QAFactEval
git clone https://github.com/salesforce/QAFactEval.git
cd QAFactEval
./download_models.sh
Ready for some fact evaluations!
echo '{"document": {"text": "This is a source document"}, "claim": "This is a summary"}' > input.jsonl
python run.py --model_folder models --fname input.jsonl --outfname out.jsonl --cuda_device -1
You can try to use your GPU by removing the "cuda_device -1" argument, but modern GPUs wont work, since this code is tied to torch 1.6 (which lacks CUDA kernels for modern hardware architectures). We can hack around this by installing a modern pytorch. Even though both qafacteval
and allennlp
declare they need torch<1.7
, it actually runs fine in a GPU if you install a modern pytorch inside the environment with pip install torch==1.13.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
.
Anyway, the models are quite small, so GPU suppport probably is not very importat. That command will create a out.jsonl file that looks like:
{"document": {"text": "This is a source document"}, "claim": "This is a summary", "metrics": {"qa-eval": {"f1": 0.0, "is_answered": 1.0, "em": 0.0}}, "qa_pairs": [[{"question": {"question_id": "dc50bcdddb09fd6e2772d349a1e8dd58", "question": "What is this?", "answer": "a summary", "sent_start": 0, "sent_end": 17, "answer_start": 8, "answer_end": 17}, "prediction": {"prediction_id": "cfcd208495d565ef66e7dff9f98764da", "prediction": "a source document", "probability": 0.735170068387259, "null_probability": 2.769950940990251e-05, "start": 8, "end": 25, "f1": 0, "is_answered": 1.0, "em": 0}}]], "qa_pairs_nonfiltered": [[{"question_id": "dc50bcdddb09fd6e2772d349a1e8dd58", "question": "What is this?", "answer": "a summary", "sent_start": 0, "sent_end": 17, "answer_start": 8, "answer_end": 17}]], "qa_summary": [[{"prediction_id": "cfcd208495d565ef66e7dff9f98764da", "prediction": "a summary", "probability": 0.6332314891401608, "null_probability": 2.6775376205867233e-05, "start": 8, "end": 17}]]}