Skip to content

Instantly share code, notes, and snippets.

@joecummings
Created August 5, 2024 20:47
Show Gist options
  • Save joecummings/7e75cde734110086cea44ffaf051edc0 to your computer and use it in GitHub Desktop.
Save joecummings/7e75cde734110086cea44ffaf051edc0 to your computer and use it in GitHub Desktop.
(eleuther) [[email protected] ~/projects/lm-evaluation-harness (multimodal-prototyping)]$ lm_eval --model hf-multimodal --tasks mmmu --batch_size 8 --model_args pretrained=facebook/chameleon-7b
2024-08-05:12:52:28,270 INFO [__main__.py:272] Verbosity set to INFO
2024-08-05:12:52:28,451 INFO [__init__.py:406] `group` and `group_alias` keys in tasks' configs will no longer be used in the next release of lm-eval. `tag` will be used to allow to call a collection of tasks just like `group`. `group` will be removed in order to not cause confusion with the new ConfigurableGroup which will be the offical way to create groups with addition of group-wide configuations.
2024-08-05:12:52:33,895 INFO [__main__.py:369] Selected Tasks: ['mmmu']
2024-08-05:12:52:33,897 INFO [evaluator.py:158] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-08-05:12:52:33,897 INFO [evaluator.py:195] Initializing hf-multimodal model, with arguments: {'pretrained': 'facebook/chameleon-7b'}
2024-08-05:12:52:35,449 INFO [huggingface.py:169] Using device 'cuda'
Some kwargs in processor config are unused and will not have any effect: image_seq_length, image_token.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.58s/it]
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,740 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,741 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,742 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,742 INFO [evaluator.py:274] Setting fewshot random generator seed to 1234
2024-08-05:12:55:12,767 INFO [task.py:428] Building contexts for mmmu_art on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 239.07it/s]
2024-08-05:12:55:13,973 INFO [task.py:428] Building contexts for mmmu_art_theory on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 186.85it/s]
2024-08-05:12:55:15,417 INFO [task.py:428] Building contexts for mmmu_design on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 391.16it/s]
2024-08-05:12:55:16,129 INFO [task.py:428] Building contexts for mmmu_music on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 248.00it/s]
2024-08-05:12:55:16,809 INFO [task.py:428] Building contexts for mmmu_accounting on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 889.04it/s]
2024-08-05:12:55:16,970 INFO [task.py:428] Building contexts for mmmu_economics on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 541.28it/s]
2024-08-05:12:55:17,216 INFO [task.py:428] Building contexts for mmmu_finance on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1363.29it/s]
2024-08-05:12:55:17,340 INFO [task.py:428] Building contexts for mmmu_manage on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 553.36it/s]
2024-08-05:12:55:17,635 INFO [task.py:428] Building contexts for mmmu_marketing on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 322.31it/s]
2024-08-05:12:55:18,008 INFO [task.py:428] Building contexts for mmmu_basic_medical_science on rank 0...
0%| | 0/30 [00:00<?, ?it/s]/home/jrcummings/.conda/envs/eleuther/lib/python3.11/site-packages/PIL/Image.py:1056: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
warnings.warn(
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 843.76it/s]
2024-08-05:12:55:18,223 INFO [task.py:428] Building contexts for mmmu_clinical_medicine on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 595.44it/s]
2024-08-05:12:55:18,673 INFO [task.py:428] Building contexts for mmmu_diagnostics_and_laboratory_medicine on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 128.36it/s]
2024-08-05:12:55:20,432 INFO [task.py:428] Building contexts for mmmu_pharmacy on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 835.76it/s]
2024-08-05:12:55:20,612 INFO [task.py:428] Building contexts for mmmu_public_health on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 837.39it/s]
2024-08-05:12:55:20,793 INFO [task.py:428] Building contexts for mmmu_history on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 251.92it/s]
2024-08-05:12:55:21,459 INFO [task.py:428] Building contexts for mmmu_literature on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 595.60it/s]
2024-08-05:12:55:22,042 INFO [task.py:428] Building contexts for mmmu_sociology on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 214.70it/s]
2024-08-05:12:55:23,078 INFO [task.py:428] Building contexts for mmmu_psychology on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 255.05it/s]
2024-08-05:12:55:23,669 INFO [task.py:428] Building contexts for mmmu_biology on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 264.29it/s]
2024-08-05:12:55:24,337 INFO [task.py:428] Building contexts for mmmu_chemistry on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1447.38it/s]
2024-08-05:12:55:24,473 INFO [task.py:428] Building contexts for mmmu_geography on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 786.17it/s]
2024-08-05:12:55:24,781 INFO [task.py:428] Building contexts for mmmu_math on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 737.05it/s]
2024-08-05:12:55:24,969 INFO [task.py:428] Building contexts for mmmu_physics on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1449.61it/s]
2024-08-05:12:55:25,111 INFO [task.py:428] Building contexts for mmmu_agriculture on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 100.85it/s]
2024-08-05:12:55:29,312 INFO [task.py:428] Building contexts for mmmu_architecture_and_engineering on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1007.08it/s]
2024-08-05:12:55:29,423 INFO [task.py:428] Building contexts for mmmu_computer_science on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 417.74it/s]
2024-08-05:12:55:29,762 INFO [task.py:428] Building contexts for mmmu_electronics on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 1915.47it/s]
2024-08-05:12:55:29,815 INFO [task.py:428] Building contexts for mmmu_energy_and_power on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 799.13it/s]
2024-08-05:12:55:30,004 INFO [task.py:428] Building contexts for mmmu_materials on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 952.73it/s]
2024-08-05:12:55:30,208 INFO [task.py:428] Building contexts for mmmu_mechanical_engineering on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 755.83it/s]
2024-08-05:12:55:30,387 INFO [evaluator.py:457] Running generate_until requests
Running generate_until requests with text+image input: 0%| | 0/900 [00:00<?, ?it/s]/home/jrcummings/.conda/envs/eleuther/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:572: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
Running generate_until requests with text+image input: 100%|████████████████████████████████████████████████████████| 900/900 [50:52<00:00, 3.39s/it]
2024-08-05:13:46:41,761 INFO [evaluation_tracker.py:240] Output path not provided, skipping saving results aggregated
hf-multimodal (pretrained=facebook/chameleon-7b), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------------------------------------|-------|------|-----:|--------|---|-----:|---|-----:|
|mmmu | N/A|none | |mmmu_acc|↑ |0.2489|± |0.0144|
| - Art and Design | N/A|none | |mmmu_acc|↑ |0.2667|± |0.0403|
| - Art |Yaml |none | 0|mmmu_acc|↑ |0.1333|± |0.0631|
| - Art Theory |Yaml |none | 0|mmmu_acc|↑ |0.3667|± |0.0895|
| - Design |Yaml |none | 0|mmmu_acc|↑ |0.2667|± |0.0821|
| - Music |Yaml |none | 0|mmmu_acc|↑ |0.3000|± |0.0851|
| - Business | N/A|none | |mmmu_acc|↑ |0.1733|± |0.0309|
| - Accounting |Yaml |none | 0|mmmu_acc|↑ |0.1333|± |0.0631|
| - Economics |Yaml |none | 0|mmmu_acc|↑ |0.3000|± |0.0851|
| - Finance |Yaml |none | 0|mmmu_acc|↑ |0.1667|± |0.0692|
| - Manage |Yaml |none | 0|mmmu_acc|↑ |0.1667|± |0.0692|
| - Marketing |Yaml |none | 0|mmmu_acc|↑ |0.1000|± |0.0557|
| - Health and Medicine | N/A|none | |mmmu_acc|↑ |0.3333|± |0.0385|
| - Basic Medical Science |Yaml |none | 0|mmmu_acc|↑ |0.3667|± |0.0895|
| - Clinical Medicine |Yaml |none | 0|mmmu_acc|↑ |0.1667|± |0.0692|
| - Diagnostics and Laboratory Medicine|Yaml |none | 0|mmmu_acc|↑ |0.3333|± |0.0875|
| - Pharmacy |Yaml |none | 0|mmmu_acc|↑ |0.4000|± |0.0910|
| - Public Health |Yaml |none | 0|mmmu_acc|↑ |0.4000|± |0.0910|
| - Humanities and Social Science | N/A|none | |mmmu_acc|↑ |0.2083|± |0.0375|
| - History |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785|
| - Literature |Yaml |none | 0|mmmu_acc|↑ |0.1333|± |0.0631|
| - Psychology |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785|
| - Sociology |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785|
| - Science | N/A|none | |mmmu_acc|↑ |0.2333|± |0.0351|
| - Biology |Yaml |none | 0|mmmu_acc|↑ |0.2000|± |0.0743|
| - Chemistry |Yaml |none | 0|mmmu_acc|↑ |0.2667|± |0.0821|
| - Geography |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785|
| - Math |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785|
| - Physics |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785|
| - Tech and Engineering | N/A|none | |mmmu_acc|↑ |0.2667|± |0.0304|
| - Agriculture |Yaml |none | 0|mmmu_acc|↑ |0.1667|± |0.0692|
| - Architecture and Engineering |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785|
| - Computer Science |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785|
| - Electronics |Yaml |none | 0|mmmu_acc|↑ |0.2000|± |0.0743|
| - Energy and Power |Yaml |none | 0|mmmu_acc|↑ |0.2333|± |0.0785|
| - Materials |Yaml |none | 0|mmmu_acc|↑ |0.3667|± |0.0895|
| - Mechanical Engineering |Yaml |none | 0|mmmu_acc|↑ |0.4333|± |0.0920|
| Groups |Version|Filter|n-shot| Metric | |Value | |Stderr|
|--------------------------------|-------|------|------|--------|---|-----:|---|-----:|
|mmmu | N/A|none | |mmmu_acc|↑ |0.2489|± |0.0144|
| - Art and Design | N/A|none | |mmmu_acc|↑ |0.2667|± |0.0403|
| - Business | N/A|none | |mmmu_acc|↑ |0.1733|± |0.0309|
| - Health and Medicine | N/A|none | |mmmu_acc|↑ |0.3333|± |0.0385|
| - Humanities and Social Science| N/A|none | |mmmu_acc|↑ |0.2083|± |0.0375|
| - Science | N/A|none | |mmmu_acc|↑ |0.2333|± |0.0351|
| - Tech and Engineering | N/A|none | |mmmu_acc|↑ |0.2667|± |0.0304|
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment