Created
August 5, 2024 19:40
-
-
Save joecummings/3b0a6848fef19f0823027e60dda3be0a to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(eleuther) [[email protected] ~/projects/lm-evaluation-harness (multimodal-prototyping)]$ lm_eval --model hf-multimodal --tasks mmmu --batch_size 1 --model_args pretrained=google/paligemma-3b-pt-224 | |
2024-08-05:12:30:21,563 INFO [__main__.py:272] Verbosity set to INFO | |
2024-08-05:12:30:21,743 INFO [__init__.py:406] `group` and `group_alias` keys in tasks' configs will no longer be used in the next release of lm-eval. `tag` will be used to allow to call a collection of tasks just like `group`. `group` will be removed in order to not cause confusion with the new ConfigurableGroup which will be the offical way to create groups with addition of group-wide configuations. | |
2024-08-05:12:30:27,238 INFO [__main__.py:369] Selected Tasks: ['mmmu'] | |
2024-08-05:12:30:27,240 INFO [evaluator.py:158] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | |
2024-08-05:12:30:27,240 INFO [evaluator.py:195] Initializing hf-multimodal model, with arguments: {'pretrained': 'google/paligemma-3b-pt-224'} | |
2024-08-05:12:30:28,809 INFO [huggingface.py:169] Using device 'cuda' | |
/home/jrcummings/.conda/envs/eleuther/lib/python3.11/site-packages/transformers/models/paligemma/configuration_paligemma.py:137: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.44, Please use `text_config.vocab_size` instead. | |
warnings.warn( | |
tokenizer_config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 40.0k/40.0k [00:00<00:00, 3.10MB/s] | |
tokenizer.model: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.26M/4.26M [00:00<00:00, 32.1MB/s] | |
tokenizer.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 17.5M/17.5M [00:00<00:00, 202MB/s] | |
added_tokens.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 24.0/24.0 [00:00<00:00, 133kB/s] | |
special_tokens_map.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 607/607 [00:00<00:00, 3.84MB/s] | |
model.safetensors.index.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 62.6k/62.6k [00:00<00:00, 10.1MB/s] | |
model-00001-of-00003.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.95G/4.95G [00:50<00:00, 97.3MB/s] | |
model-00002-of-00003.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5.00G/5.00G [00:31<00:00, 157MB/s] | |
model-00003-of-00003.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.74G/1.74G [00:09<00:00, 179MB/s] | |
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [01:32<00:00, 30.99s/it] | |
`config.hidden_act` is ignored, you should use `config.hidden_activation` instead. | |
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use | |
`config.hidden_activation` if you want to override this behaviour. | |
See https://github.com/huggingface/transformers/pull/29402 for more details. | |
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:08<00:00, 2.74s/it] | |
generation_config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 137/137 [00:00<00:00, 921kB/s] | |
Downloading readme: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 42.3k/42.3k [00:00<00:00, 1.40MB/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 164k/164k [00:00<00:00, 825kB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 877k/877k [00:00<00:00, 4.79MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 15.0M/15.0M [00:00<00:00, 39.0MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 402.09 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 3962.62 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 429/429 [00:00<00:00, 5485.83 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 250k/250k [00:00<00:00, 1.11MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.31M/2.31M [00:00<00:00, 8.16MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 24.8M/24.8M [00:00<00:00, 42.6MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 767.06 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2215.11 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 458/458 [00:00<00:00, 3515.50 examples/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 114k/114k [00:00<00:00, 595kB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.65M/1.65M [00:00<00:00, 8.08MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 14.7M/14.7M [00:00<00:00, 36.1MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 764.30 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2389.87 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 432/432 [00:00<00:00, 4810.90 examples/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 134k/134k [00:00<00:00, 561kB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 645k/645k [00:00<00:00, 3.37MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5.02M/5.02M [00:00<00:00, 11.0MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 906.13 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 4601.54 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 256/256 [00:00<00:00, 7736.23 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 446k/446k [00:00<00:00, 2.41MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.08M/2.08M [00:00<00:00, 8.66MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 24.8M/24.8M [00:00<00:00, 32.7MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 818.69 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2329.56 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 371/371 [00:00<00:00, 2655.06 examples/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 149k/149k [00:00<00:00, 861kB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 727k/727k [00:00<00:00, 3.94MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 16.1M/16.1M [00:00<00:00, 28.9MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 920.65 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 3888.18 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 551/551 [00:00<00:00, 6439.78 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 22.1M/22.1M [00:00<00:00, 48.0MB/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 119M/119M [00:00<00:00, 129MB/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 496M/496M [00:03<00:00, 156MB/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 489M/489M [00:02<00:00, 163MB/s] | |
Generating dev split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 29.63 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 30.38 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 287/287 [00:06<00:00, 46.95 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 241k/241k [00:00<00:00, 1.38MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.12M/1.12M [00:00<00:00, 5.56MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 15.8M/15.8M [00:00<00:00, 43.0MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 740.47 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2789.20 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 408/408 [00:00<00:00, 4549.80 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 192k/192k [00:00<00:00, 1.11MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.45M/1.45M [00:00<00:00, 6.21MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 26.6M/26.6M [00:00<00:00, 41.9MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 898.21 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2877.48 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 505/505 [00:00<00:00, 3935.66 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.50M/1.50M [00:00<00:00, 6.36MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6.68M/6.68M [00:00<00:00, 18.1MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 135M/135M [00:01<00:00, 85.0MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 393.25 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 978.46 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 565/565 [00:00<00:00, 829.30 examples/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 272k/272k [00:00<00:00, 679kB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.52M/1.52M [00:00<00:00, 4.64MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 34.8M/34.8M [00:00<00:00, 41.7MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 735.97 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2569.15 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 603/603 [00:00<00:00, 3817.33 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 584k/584k [00:00<00:00, 1.16MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8.49M/8.49M [00:00<00:00, 12.7MB/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 124M/124M [00:01<00:00, 112MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 667.97 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 616.30 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 345/345 [00:00<00:00, 473.77 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 615k/615k [00:00<00:00, 2.91MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.31M/4.31M [00:00<00:00, 15.3MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 41.7M/41.7M [00:01<00:00, 31.5MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 581.57 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 727.37 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 305/305 [00:00<00:00, 633.14 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3.78M/3.78M [00:00<00:00, 14.2MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 18.5M/18.5M [00:00<00:00, 31.8MB/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 138M/138M [00:01<00:00, 104MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 121.62 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 161.52 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 252/252 [00:01<00:00, 177.33 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.46M/2.46M [00:00<00:00, 9.54MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 14.2M/14.2M [00:00<00:00, 28.9MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 49.4M/49.4M [00:00<00:00, 55.5MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 312.44 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 390.08 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 112/112 [00:00<00:00, 378.57 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.46M/1.46M [00:00<00:00, 4.78MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8.43M/8.43M [00:00<00:00, 17.9MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 107M/107M [00:01<00:00, 69.7MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 340.27 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 597.78 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 278/278 [00:00<00:00, 410.07 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 244k/244k [00:00<00:00, 1.27MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.51M/1.51M [00:00<00:00, 4.04MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 21.7M/21.7M [00:00<00:00, 33.5MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 696.27 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2318.49 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 509/509 [00:00<00:00, 3413.62 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 218k/218k [00:00<00:00, 1.11MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.55M/1.55M [00:00<00:00, 7.35MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 28.7M/28.7M [00:00<00:00, 43.8MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 846.38 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2381.73 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 430/430 [00:00<00:00, 2823.83 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.07M/2.07M [00:00<00:00, 5.03MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 37.1M/37.1M [00:00<00:00, 43.2MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 156M/156M [00:01<00:00, 85.8MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 345.36 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 138.94 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 162/162 [00:00<00:00, 227.70 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.48M/1.48M [00:00<00:00, 5.61MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 10.9M/10.9M [00:00<00:00, 22.3MB/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 96.6M/96.6M [00:00<00:00, 106MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 463.31 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 488.63 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 325/325 [00:00<00:00, 666.03 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 826k/826k [00:00<00:00, 4.05MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.13M/4.13M [00:00<00:00, 16.5MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 47.7M/47.7M [00:00<00:00, 53.3MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 687.57 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 1509.05 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 326/326 [00:00<00:00, 1639.71 examples/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 117k/117k [00:00<00:00, 661kB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.36M/1.36M [00:00<00:00, 5.76MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.98M/4.98M [00:00<00:00, 16.0MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 714.95 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2177.65 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 181/181 [00:00<00:00, 4361.49 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 459k/459k [00:00<00:00, 2.11MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3.14M/3.14M [00:00<00:00, 11.0MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 23.2M/23.2M [00:00<00:00, 25.6MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 596.75 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 1651.15 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 245/245 [00:00<00:00, 1857.06 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 306k/306k [00:00<00:00, 1.44MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.00M/1.00M [00:00<00:00, 3.57MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9.52M/9.52M [00:00<00:00, 28.3MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 759.78 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2621.33 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 355/355 [00:00<00:00, 5745.33 examples/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 174k/174k [00:00<00:00, 719kB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.42M/1.42M [00:00<00:00, 4.71MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8.78M/8.78M [00:00<00:00, 23.6MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 784.74 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 2341.44 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 267/267 [00:00<00:00, 4850.29 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 273k/273k [00:00<00:00, 1.86MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.54M/1.54M [00:00<00:00, 9.36MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 17.8M/17.8M [00:00<00:00, 45.0MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 789.65 examples/s] | |
Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 1961.45 examples/s] | |
Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 380/380 [00:00<00:00, 3267.03 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.43M/1.43M [00:00<00:00, 6.81MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9.36M/9.36M [00:00<00:00, 23.9MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 81.8M/81.8M [00:00<00:00, 95.4MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 398.91 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 590.35 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 334/334 [00:00<00:00, 664.17 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.27M/2.27M [00:00<00:00, 8.28MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 16.2M/16.2M [00:00<00:00, 24.3MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 61.9M/61.9M [00:00<00:00, 68.6MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 287.19 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 319.84 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 169/169 [00:00<00:00, 444.78 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6.39M/6.39M [00:00<00:00, 20.8MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 29.8M/29.8M [00:00<00:00, 48.8MB/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 206M/206M [00:01<00:00, 126MB/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 199M/199M [00:01<00:00, 109MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 139.40 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 284.58 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 429/429 [00:02<00:00, 214.33 examples/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6.25M/6.25M [00:00<00:00, 13.5MB/s] | |
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 29.9M/29.9M [00:00<00:00, 52.0MB/s] | |
Downloading data: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 237M/237M [00:02<00:00, 113MB/s] | |
Generating dev split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 175.46 examples/s] | |
Generating validation split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 263.91 examples/s] | |
Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 231/231 [00:01<00:00, 142.39 examples/s] | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,839 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,840 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:37:18,865 INFO [task.py:428] Building contexts for mmmu_art on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 236.24it/s] | |
2024-08-05:12:37:20,081 INFO [task.py:428] Building contexts for mmmu_art_theory on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 172.61it/s] | |
2024-08-05:12:37:21,555 INFO [task.py:428] Building contexts for mmmu_design on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 237.54it/s] | |
2024-08-05:12:37:22,363 INFO [task.py:428] Building contexts for mmmu_music on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 149.42it/s] | |
2024-08-05:12:37:23,209 INFO [task.py:428] Building contexts for mmmu_accounting on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 573.88it/s] | |
2024-08-05:12:37:23,418 INFO [task.py:428] Building contexts for mmmu_economics on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 335.70it/s] | |
2024-08-05:12:37:23,730 INFO [task.py:428] Building contexts for mmmu_finance on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 704.15it/s] | |
2024-08-05:12:37:23,902 INFO [task.py:428] Building contexts for mmmu_manage on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 329.68it/s] | |
2024-08-05:12:37:24,280 INFO [task.py:428] Building contexts for mmmu_marketing on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 212.53it/s] | |
2024-08-05:12:37:24,765 INFO [task.py:428] Building contexts for mmmu_basic_medical_science on rank 0... | |
0%| | 0/30 [00:00<?, ?it/s]/home/jrcummings/.conda/envs/eleuther/lib/python3.11/site-packages/PIL/Image.py:1056: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images | |
warnings.warn( | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 752.65it/s] | |
2024-08-05:12:37:25,009 INFO [task.py:428] Building contexts for mmmu_clinical_medicine on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 550.09it/s] | |
2024-08-05:12:37:25,493 INFO [task.py:428] Building contexts for mmmu_diagnostics_and_laboratory_medicine on rank 0... | |
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 77.98it/s] | |
2024-08-05:12:37:27,451 INFO [task.py:428] Building contexts for mmmu_pharmacy on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 850.39it/s] | |
2024-08-05:12:37:27,654 INFO [task.py:428] Building contexts for mmmu_public_health on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 516.41it/s] | |
2024-08-05:12:37:27,876 INFO [task.py:428] Building contexts for mmmu_history on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 157.13it/s] | |
2024-08-05:12:37:28,722 INFO [task.py:428] Building contexts for mmmu_literature on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 290.02it/s] | |
2024-08-05:12:37:29,420 INFO [task.py:428] Building contexts for mmmu_sociology on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 131.57it/s] | |
2024-08-05:12:37:30,635 INFO [task.py:428] Building contexts for mmmu_psychology on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 245.45it/s] | |
2024-08-05:12:37:31,295 INFO [task.py:428] Building contexts for mmmu_biology on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 158.79it/s] | |
2024-08-05:12:37:32,048 INFO [task.py:428] Building contexts for mmmu_chemistry on rank 0... | |
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 1337.64it/s] | |
2024-08-05:12:37:32,187 INFO [task.py:428] Building contexts for mmmu_geography on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 445.12it/s] | |
2024-08-05:12:37:32,528 INFO [task.py:428] Building contexts for mmmu_math on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 402.99it/s] | |
2024-08-05:12:37:32,793 INFO [task.py:428] Building contexts for mmmu_physics on rank 0... | |
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 1024.33it/s] | |
2024-08-05:12:37:32,968 INFO [task.py:428] Building contexts for mmmu_agriculture on rank 0... | |
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 64.50it/s] | |
2024-08-05:12:37:37,805 INFO [task.py:428] Building contexts for mmmu_architecture_and_engineering on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 994.26it/s] | |
2024-08-05:12:37:37,919 INFO [task.py:428] Building contexts for mmmu_computer_science on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 408.06it/s] | |
2024-08-05:12:37:38,261 INFO [task.py:428] Building contexts for mmmu_electronics on rank 0... | |
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 1278.65it/s] | |
2024-08-05:12:37:38,322 INFO [task.py:428] Building contexts for mmmu_energy_and_power on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 472.27it/s] | |
2024-08-05:12:37:38,546 INFO [task.py:428] Building contexts for mmmu_materials on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 650.98it/s] | |
2024-08-05:12:37:38,782 INFO [task.py:428] Building contexts for mmmu_mechanical_engineering on rank 0... | |
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [00:00<00:00, 515.72it/s] | |
2024-08-05:12:37:39,005 INFO [evaluator.py:457] Running generate_until requests | |
Running generate_until requests with text+image input: 0%| | 0/900 [00:00<?, ?it/s]Traceback (most recent call last): | |
File "/home/jrcummings/.conda/envs/eleuther/bin/lm_eval", line 8, in <module> | |
sys.exit(cli_evaluate()) | |
^^^^^^^^^^^^^^ | |
File "/home/jrcummings/projects/lm-evaluation-harness/lm_eval/__main__.py", line 375, in cli_evaluate | |
results = evaluator.simple_evaluate( | |
^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
File "/home/jrcummings/projects/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper | |
return fn(*args, **kwargs) | |
^^^^^^^^^^^^^^^^^^^ | |
File "/home/jrcummings/projects/lm-evaluation-harness/lm_eval/evaluator.py", line 296, in simple_evaluate | |
results = evaluate( | |
^^^^^^^^^ | |
File "/home/jrcummings/projects/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper | |
return fn(*args, **kwargs) | |
^^^^^^^^^^^^^^^^^^^ | |
File "/home/jrcummings/projects/lm-evaluation-harness/lm_eval/evaluator.py", line 468, in evaluate | |
resps = getattr(lm, reqtype)(cloned_reqs) | |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
File "/home/jrcummings/projects/lm-evaluation-harness/lm_eval/models/hf_vlms.py", line 272, in generate_until | |
inputs = self.tok_batch_encode( | |
^^^^^^^^^^^^^^^^^^^^^^ | |
File "/home/jrcummings/projects/lm-evaluation-harness/lm_eval/models/hf_vlms.py", line 126, in tok_batch_encode | |
encoding = self.processor( | |
^^^^^^^^^^^^^^^ | |
File "/home/jrcummings/.conda/envs/eleuther/lib/python3.11/site-packages/transformers/models/paligemma/processing_paligemma.py", line 223, in __call__ | |
raise ValueError("`images` are expected as arguments to a `PaliGemmaProcessor` instance.") | |
ValueError: `images` are expected as arguments to a `PaliGemmaProcessor` instance. | |
Running generate_until requests with text+image input: 0%| | 0/900 [00:00<?, ?it/s] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment