Skip to content

Instantly share code, notes, and snippets.

View CultriX-Github's full-sized avatar

CultriX CultriX-Github

  • Netherlands
  • 04:58 (UTC +02:00)
View GitHub Profile
Model AGIEval GPT4All TruthfulQA Bigbench Average
dolphin-2.2.1-mistral-7b 38.64 72.24 54.09 39.22 51.05

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.23 ± 2.65
acc_norm 21.26 ± 2.57
agieval_logiqa_en 0 acc 35.48 ± 1.88
Model AGIEval GPT4All TruthfulQA Bigbench Average
zephyr-7b-alpha 38 72.24 56.06 40.57 51.72

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 20.47 ± 2.54
acc_norm 19.69 ± 2.50
agieval_logiqa_en 0 acc 31.49 ± 1.82
Model AGIEval GPT4All TruthfulQA Bigbench Average
MergeCeption-7B-v3 45.16 76.86 79.27 49.86 62.79

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.95 ± 2.82
acc_norm 25.59 ± 2.74
agieval_logiqa_en 0 acc 39.02 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
AlphaMonarch-dora 45.42 76.93 78.48 50.18 62.75

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 28.35 ± 2.83
acc_norm 26.38 ± 2.77
agieval_logiqa_en 0 acc 38.71 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
NeuralCeptrix-7B-SLERPv2 45.28 77.03 78.84 49.75 62.73

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.17 ± 2.80
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 38.10 ± 1.90
Model AGIEval GPT4All TruthfulQA Bigbench Average
M7-7b 44.84 77.01 78.4 49.1 62.34

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.56 ± 2.81
acc_norm 25.20 ± 2.73
agieval_logiqa_en 0 acc 39.78 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
T3Q-Mistral-Orca-Math-DPO 44.41 76.83 78.78 49.43 62.36

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 26.38 ± 2.77
acc_norm 23.62 ± 2.67
agieval_logiqa_en 0 acc 39.32 ± 1.92
Model AGIEval GPT4All TruthfulQA Bigbench Average
NeuralCeptrix-7B-SLERPv3 45.28 77.03 78.84 49.75 62.73

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.17 ± 2.80
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 38.10 ± 1.90
Model AGIEval GPT4All TruthfulQA Bigbench Average
Monatrix-v4-dpo 45.4 76.33 78.44 49.59 62.44

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 29.13 ± 2.86
acc_norm 27.17 ± 2.80
agieval_logiqa_en 0 acc 39.02 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench Average
CodeNinja-1.0-OpenChat-7B 39.98 71.77 48.73 40.92 50.35

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.17 ± 2.80
acc_norm 26.38 ± 2.77
agieval_logiqa_en 0 acc 38.10 ± 1.90