Skip to content

Instantly share code, notes, and snippets.

Model AGIEval GPT4All TruthfulQA Bigbench Average
LLAMA_Harsha_8_B_ORDP_10k 35.54 71.15 55.39 37.96 50.01

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 26.77 ± 2.78
acc_norm 27.17 ± 2.80
agieval_logiqa_en 0 acc 31.34 ± 1.82