| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| dolphin-2.2.1-mistral-7b | 38.64 | 72.24 | 54.09 | 39.22 | 51.05 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 23.23 | ± | 2.65 |
| acc_norm | 21.26 | ± | 2.57 | ||
| agieval_logiqa_en | 0 | acc | 35.48 | ± | 1.88 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| dolphin-2.2.1-mistral-7b | 38.64 | 72.24 | 54.09 | 39.22 | 51.05 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 23.23 | ± | 2.65 |
| acc_norm | 21.26 | ± | 2.57 | ||
| agieval_logiqa_en | 0 | acc | 35.48 | ± | 1.88 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| zephyr-7b-alpha | 38 | 72.24 | 56.06 | 40.57 | 51.72 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 20.47 | ± | 2.54 |
| acc_norm | 19.69 | ± | 2.50 | ||
| agieval_logiqa_en | 0 | acc | 31.49 | ± | 1.82 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| MergeCeption-7B-v3 | 45.16 | 76.86 | 79.27 | 49.86 | 62.79 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 27.95 | ± | 2.82 |
| acc_norm | 25.59 | ± | 2.74 | ||
| agieval_logiqa_en | 0 | acc | 39.02 | ± | 1.91 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| AlphaMonarch-dora | 45.42 | 76.93 | 78.48 | 50.18 | 62.75 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 28.35 | ± | 2.83 |
| acc_norm | 26.38 | ± | 2.77 | ||
| agieval_logiqa_en | 0 | acc | 38.71 | ± | 1.91 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| NeuralCeptrix-7B-SLERPv2 | 45.28 | 77.03 | 78.84 | 49.75 | 62.73 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 27.17 | ± | 2.80 |
| acc_norm | 25.98 | ± | 2.76 | ||
| agieval_logiqa_en | 0 | acc | 38.10 | ± | 1.90 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| M7-7b | 44.84 | 77.01 | 78.4 | 49.1 | 62.34 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 27.56 | ± | 2.81 |
| acc_norm | 25.20 | ± | 2.73 | ||
| agieval_logiqa_en | 0 | acc | 39.78 | ± | 1.92 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| T3Q-Mistral-Orca-Math-DPO | 44.41 | 76.83 | 78.78 | 49.43 | 62.36 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 26.38 | ± | 2.77 |
| acc_norm | 23.62 | ± | 2.67 | ||
| agieval_logiqa_en | 0 | acc | 39.32 | ± | 1.92 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| NeuralCeptrix-7B-SLERPv3 | 45.28 | 77.03 | 78.84 | 49.75 | 62.73 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 27.17 | ± | 2.80 |
| acc_norm | 25.98 | ± | 2.76 | ||
| agieval_logiqa_en | 0 | acc | 38.10 | ± | 1.90 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| Monatrix-v4-dpo | 45.4 | 76.33 | 78.44 | 49.59 | 62.44 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 29.13 | ± | 2.86 |
| acc_norm | 27.17 | ± | 2.80 | ||
| agieval_logiqa_en | 0 | acc | 39.02 | ± | 1.91 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| CodeNinja-1.0-OpenChat-7B | 39.98 | 71.77 | 48.73 | 40.92 | 50.35 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 27.17 | ± | 2.80 |
| acc_norm | 26.38 | ± | 2.77 | ||
| agieval_logiqa_en | 0 | acc | 38.10 | ± | 1.90 |