Created
August 5, 2024 20:47
-
-
Save joecummings/7e75cde734110086cea44ffaf051edc0 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(eleuther) [[email protected] ~/projects/lm-evaluation-harness (multimodal-prototyping)]$ lm_eval --model hf-multimodal --tasks mmmu --batch_size 8 --model_args pretrained=facebook/chameleon-7b | |
2024-08-05:12:52:28,270 INFO [__main__.py:272] Verbosity set to INFO | |
2024-08-05:12:52:28,451 INFO [__init__.py:406] `group` and `group_alias` keys in tasks' configs will no longer be used in the next release of lm-eval. `tag` will be used to allow to call a collection of tasks just like `group`. `group` will be removed in order to not cause confusion with the new ConfigurableGroup which will be the offical way to create groups with addition of group-wide configuations. | |
2024-08-05:12:52:33,895 INFO [__main__.py:369] Selected Tasks: ['mmmu'] | |
2024-08-05:12:52:33,897 INFO [evaluator.py:158] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | |
2024-08-05:12:52:33,897 INFO [evaluator.py:195] Initializing hf-multimodal model, with arguments: {'pretrained': 'facebook/chameleon-7b'} | |
2024-08-05:12:52:35,449 INFO [huggingface.py:169] Using device 'cuda' | |
Some kwargs in processor config are unused and will not have any effect: image_seq_length, image_token. | |
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.58s/it] | |
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,742 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,742 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234 | |
2024-08-05:12:55:12,767 INFO [task.py:428] Building contexts for mmmu_art on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 239.07it/s] | |
2024-08-05:12:55:13,973 INFO [task.py:428] Building contexts for mmmu_art_theory on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 186.85it/s] | |
2024-08-05:12:55:15,417 INFO [task.py:428] Building contexts for mmmu_design on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 391.16it/s] | |
2024-08-05:12:55:16,129 INFO [task.py:428] Building contexts for mmmu_music on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 248.00it/s] | |
2024-08-05:12:55:16,809 INFO [task.py:428] Building contexts for mmmu_accounting on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 889.04it/s] | |
2024-08-05:12:55:16,970 INFO [task.py:428] Building contexts for mmmu_economics on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 541.28it/s] | |
2024-08-05:12:55:17,216 INFO [task.py:428] Building contexts for mmmu_finance on rank 0... | |
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1363.29it/s] | |
2024-08-05:12:55:17,340 INFO [task.py:428] Building contexts for mmmu_manage on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 553.36it/s] | |
2024-08-05:12:55:17,635 INFO [task.py:428] Building contexts for mmmu_marketing on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 322.31it/s] | |
2024-08-05:12:55:18,008 INFO [task.py:428] Building contexts for mmmu_basic_medical_science on rank 0... | |
0%| | 0/30 [00:00<?, ?it/s]/home/jrcummings/.conda/envs/eleuther/lib/python3.11/site-packages/PIL/Image.py:1056: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images | |
warnings.warn( | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 843.76it/s] | |
2024-08-05:12:55:18,223 INFO [task.py:428] Building contexts for mmmu_clinical_medicine on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 595.44it/s] | |
2024-08-05:12:55:18,673 INFO [task.py:428] Building contexts for mmmu_diagnostics_and_laboratory_medicine on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 128.36it/s] | |
2024-08-05:12:55:20,432 INFO [task.py:428] Building contexts for mmmu_pharmacy on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 835.76it/s] | |
2024-08-05:12:55:20,612 INFO [task.py:428] Building contexts for mmmu_public_health on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 837.39it/s] | |
2024-08-05:12:55:20,793 INFO [task.py:428] Building contexts for mmmu_history on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 251.92it/s] | |
2024-08-05:12:55:21,459 INFO [task.py:428] Building contexts for mmmu_literature on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 595.60it/s] | |
2024-08-05:12:55:22,042 INFO [task.py:428] Building contexts for mmmu_sociology on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 214.70it/s] | |
2024-08-05:12:55:23,078 INFO [task.py:428] Building contexts for mmmu_psychology on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 255.05it/s] | |
2024-08-05:12:55:23,669 INFO [task.py:428] Building contexts for mmmu_biology on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 264.29it/s] | |
2024-08-05:12:55:24,337 INFO [task.py:428] Building contexts for mmmu_chemistry on rank 0... | |
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1447.38it/s] | |
2024-08-05:12:55:24,473 INFO [task.py:428] Building contexts for mmmu_geography on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 786.17it/s] | |
2024-08-05:12:55:24,781 INFO [task.py:428] Building contexts for mmmu_math on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 737.05it/s] | |
2024-08-05:12:55:24,969 INFO [task.py:428] Building contexts for mmmu_physics on rank 0... | |
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1449.61it/s] | |
2024-08-05:12:55:25,111 INFO [task.py:428] Building contexts for mmmu_agriculture on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 100.85it/s] | |
2024-08-05:12:55:29,312 INFO [task.py:428] Building contexts for mmmu_architecture_and_engineering on rank 0... | |
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1007.08it/s] | |
2024-08-05:12:55:29,423 INFO [task.py:428] Building contexts for mmmu_computer_science on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 417.74it/s] | |
2024-08-05:12:55:29,762 INFO [task.py:428] Building contexts for mmmu_electronics on rank 0... | |
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1915.47it/s] | |
2024-08-05:12:55:29,815 INFO [task.py:428] Building contexts for mmmu_energy_and_power on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 799.13it/s] | |
2024-08-05:12:55:30,004 INFO [task.py:428] Building contexts for mmmu_materials on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 952.73it/s] | |
2024-08-05:12:55:30,208 INFO [task.py:428] Building contexts for mmmu_mechanical_engineering on rank 0... | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 755.83it/s] | |
2024-08-05:12:55:30,387 INFO [evaluator.py:457] Running generate_until requests | |
Running generate_until requests with text+image input: 0%| | 0/900 [00:00<?, ?it/s]/home/jrcummings/.conda/envs/eleuther/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:572: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. | |
warnings.warn( | |
Running generate_until requests with text+image input: 100%|████████████████████████████████████████████████████████| 900/900 [50:52<00:00, 3.39s/it] | |
2024-08-05:13:46:41,761 INFO [evaluation_tracker.py:240] Output path not provided, skipping saving results aggregated | |
hf-multimodal (pretrained=facebook/chameleon-7b), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8 | |
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| | |
|---------------------------------------|-------|------|-----:|--------|---|-----:|---|-----:| | |
|mmmu | N/A|none | |mmmu_acc|↑ |0.2489|± |0.0144| | |
| - Art and Design | N/A|none | |mmmu_acc|↑ |0.2667|± |0.0403| | |
| - Art |Yaml |none | 0|mmmu_acc|↑ |0.1333|± |0.0631| | |
| - Art Theory |Yaml |none | 0|mmmu_acc|↑ |0.3667|± |0.0895| | |
| - Design |Yaml |none | 0|mmmu_acc|↑ |0.2667|± |0.0821| | |
| - Music |Yaml |none | 0|mmmu_acc|↑ |0.3000|± |0.0851| | |
| - Business | N/A|none | |mmmu_acc|↑ |0.1733|± |0.0309| | |
| - Accounting |Yaml |none | 0|mmmu_acc|↑ |0.1333|± |0.0631| | |
| - Economics |Yaml |none | 0|mmmu_acc|↑ |0.3000|± |0.0851| | |
| - Finance |Yaml |none | 0|mmmu_acc|↑ |0.1667|± |0.0692| | |
| - Manage |Yaml |none | 0|mmmu_acc|↑ |0.1667|± |0.0692| | |
| - Marketing |Yaml |none | 0|mmmu_acc|↑ |0.1000|± |0.0557| | |
| - Health and Medicine | N/A|none | |mmmu_acc|↑ |0.3333|± |0.0385| | |
| - Basic Medical Science |Yaml |none | 0|mmmu_acc|↑ |0.3667|± |0.0895| | |
| - Clinical Medicine |Yaml |none | 0|mmmu_acc|↑ |0.1667|± |0.0692| | |
| - Diagnostics and Laboratory Medicine|Yaml |none | 0|mmmu_acc|↑ |0.3333|± |0.0875| | |
| - Pharmacy |Yaml |none | 0|mmmu_acc|↑ |0.4000|± |0.0910| | |
| - Public Health |Yaml |none | 0|mmmu_acc|↑ |0.4000|± |0.0910| | |
| - Humanities and Social Science | N/A|none | |mmmu_acc|↑ |0.2083|± |0.0375| | |
| - History |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785| | |
| - Literature |Yaml |none | 0|mmmu_acc|↑ |0.1333|± |0.0631| | |
| - Psychology |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785| | |
| - Sociology |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785| | |
| - Science | N/A|none | |mmmu_acc|↑ |0.2333|± |0.0351| | |
| - Biology |Yaml |none | 0|mmmu_acc|↑ |0.2000|± |0.0743| | |
| - Chemistry |Yaml |none | 0|mmmu_acc|↑ |0.2667|± |0.0821| | |
| - Geography |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785| | |
| - Math |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785| | |
| - Physics |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785| | |
| - Tech and Engineering | N/A|none | |mmmu_acc|↑ |0.2667|± |0.0304| | |
| - Agriculture |Yaml |none | 0|mmmu_acc|↑ |0.1667|± |0.0692| | |
| - Architecture and Engineering |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785| | |
| - Computer Science |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785| | |
| - Electronics |Yaml |none | 0|mmmu_acc|↑ |0.2000|± |0.0743| | |
| - Energy and Power |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785| | |
| - Materials |Yaml |none | 0|mmmu_acc|↑ |0.3667|± |0.0895| | |
| - Mechanical Engineering |Yaml |none | 0|mmmu_acc|↑ |0.4333|± |0.0920| | |
| Groups |Version|Filter|n-shot| Metric | |Value | |Stderr| | |
|--------------------------------|-------|------|------|--------|---|-----:|---|-----:| | |
|mmmu | N/A|none | |mmmu_acc|↑ |0.2489|± |0.0144| | |
| - Art and Design | N/A|none | |mmmu_acc|↑ |0.2667|± |0.0403| | |
| - Business | N/A|none | |mmmu_acc|↑ |0.1733|± |0.0309| | |
| - Health and Medicine | N/A|none | |mmmu_acc|↑ |0.3333|± |0.0385| | |
| - Humanities and Social Science| N/A|none | |mmmu_acc|↑ |0.2083|± |0.0375| | |
| - Science | N/A|none | |mmmu_acc|↑ |0.2333|± |0.0351| | |
| - Tech and Engineering | N/A|none | |mmmu_acc|↑ |0.2667|± |0.0304| |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment