| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| Starling-LM-7B-alpha | 42.06 | 72.72 | 47.33 | 42.53 | 51.16 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 24.80 | ± | 2.72 |
| acc_norm | 25.98 | ± | 2.76 | ||
| agieval_logiqa_en | 0 | acc | 38.25 | ± | 1.91 |
| ACTION = build | |
| AD_HOC_CODE_SIGNING_ALLOWED = NO | |
| ALTERNATE_GROUP = staff | |
| ALTERNATE_MODE = u+w,go-w,a+rX | |
| ALTERNATE_OWNER = grantdavis | |
| ALWAYS_SEARCH_USER_PATHS = NO | |
| ALWAYS_USE_SEPARATE_HEADERMAPS = YES | |
| APPLE_INTERNAL_DEVELOPER_DIR = /AppleInternal/Developer | |
| APPLE_INTERNAL_DIR = /AppleInternal | |
| APPLE_INTERNAL_DOCUMENTATION_DIR = /AppleInternal/Documentation |
| // Promises are started in parallel. | |
| // Resolves with the first resolved value in time. | |
| // If there's no winner, it rejects with last rejection. | |
| Promise.any = function (promises) { | |
| return new Promise((resolve, reject) => { | |
| var rejectedCount = 0; | |
| function onMemberResolved(value) { | |
| resolve(value); | |
| } | |
| function onMemberRejected(reason) { |
| // Promises are started in parallel. | |
| // Resolves with the first resolved value in array order. | |
| // If there's no winner, it rejects with last rejection. | |
| Promise.preferred = function (promisesOrdered) { | |
| return new Promise((resolve, reject) => { | |
| var resolvedValues = new WeakMap(); | |
| var resolvables = promisesOrdered.slice(); // copy | |
| function onMemberResolved(value, member) { | |
| resolvedValues.set(member, value); | |
| if (member == resolvables[0]) |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| Starling-LM-7B-alpha | 42.06 | 72.72 | 47.33 | 42.53 | 51.16 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 24.80 | ± | 2.72 |
| acc_norm | 25.98 | ± | 2.76 | ||
| agieval_logiqa_en | 0 | acc | 38.25 | ± | 1.91 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| mistral-ft-optimized-1218 | 44.74 | 75.6 | 59.89 | 47.17 | 56.85 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 25.20 | ± | 2.73 |
| acc_norm | 24.02 | ± | 2.69 | ||
| agieval_logiqa_en | 0 | acc | 39.32 | ± | 1.92 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| go-bruins-v2.1.1 | 44.83 | 76.76 | 70.31 | 47.36 | 59.81 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 28.35 | ± | 2.83 |
| acc_norm | 27.56 | ± | 2.81 | ||
| agieval_logiqa_en | 0 | acc | 38.40 | ± | 1.91 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| openchat_3.5 | 42.67 | 72.92 | 47.27 | 42.51 | 51.34 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 24.02 | ± | 2.69 |
| acc_norm | 24.80 | ± | 2.72 | ||
| agieval_logiqa_en | 0 | acc | 38.86 | ± | 1.91 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| zephyr-7b-beta | 37.33 | 71.83 | 55.1 | 39.7 | 50.99 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 21.26 | ± | 2.57 |
| acc_norm | 20.47 | ± | 2.54 | ||
| agieval_logiqa_en | 0 | acc | 33.33 | ± | 1.85 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| zephyr-7b-alpha | 38 | 72.24 | 56.06 | 40.57 | 51.72 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 20.47 | ± | 2.54 |
| acc_norm | 19.69 | ± | 2.50 | ||
| agieval_logiqa_en | 0 | acc | 31.49 | ± | 1.82 |
| 2024-01-09T14:51:49.894270414Z return fn(*args, **kwargs) | |
| 2024-01-09T14:51:49.894273580Z File "/lm-evaluation-harness/lm_eval/evaluator.py", line 69, in simple_evaluate | |
| 2024-01-09T14:51:49.894279732Z lm = lm_eval.models.get_model(model).create_from_arg_string( | |
| 2024-01-09T14:51:49.894283779Z File "/lm-evaluation-harness/lm_eval/base.py", line 115, in create_from_arg_string | |
| 2024-01-09T14:51:49.894316350Z return cls(**args, **args2) | |
| 2024-01-09T14:51:49.894323294Z File "/lm-evaluation-harness/lm_eval/models/gpt2.py", line 67, in __init__ | |
| 2024-01-09T14:51:49.894355253Z self.tokenizer = transformers.AutoTokenizer.from_pretrained( | |
| 2024-01-09T14:51:49.894361435Z File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 787, in from_pretrained | |
| 2024-01-09T14:51:49.894470349Z return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) | |
| 2024-01-09T14:51:49.894475349Z File "/usr/local/lib/python3.10/dist-packages/transformer |