relyt0925 · July 7, 2024 02:48
diff --git a/mmlu.log b/mmlu.log
 INFO 2024-07-06 21:21:06,958 utils.py:145: _init_num_threads Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
 INFO 2024-07-06 21:21:06,958 utils.py:148: _init_num_threads Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
 INFO 2024-07-06 21:21:06,958 utils.py:161: _init_num_threads NumExpr defaulting to 16 threads.
 INFO 2024-07-06 21:21:07,119 config.py:58: <module> PyTorch version 2.3.1 available.
 INFO 2024-07-06 21:21:13,649 evaluator.py:152: simple_evaluate Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
 INFO 2024-07-06 21:21:13,650 evaluator.py:189: simple_evaluate Initializing hf model, with arguments: {'pretrained': 'models/tuned-0701-1954/samples_4992', 'dtype': 'bfloat16'}
 /usr/local/lib64/python3.11/site-packages/torch/cuda/__init__.py:619: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
 INFO 2024-07-06 21:21:13,666 huggingface.py:170: __init__ Using device 'cuda'
 You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
 WARNING 2024-07-06 21:22:39,798 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_world_religions from None to 2
 INFO 2024-07-06 21:22:39,798 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,798 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_virology from None to 2
 INFO 2024-07-06 21:22:39,798 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,798 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_us_foreign_policy from None to 2
 INFO 2024-07-06 21:22:39,798 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,798 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_sociology from None to 2
 INFO 2024-07-06 21:22:39,798 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,798 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_security_studies from None to 2
 INFO 2024-07-06 21:22:39,798 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,798 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_public_relations from None to 2
 INFO 2024-07-06 21:22:39,798 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,798 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_professional_psychology from None to 2
 INFO 2024-07-06 21:22:39,798 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,798 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_professional_medicine from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_professional_law from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_professional_accounting from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_prehistory from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_philosophy from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_nutrition from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_moral_scenarios from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_moral_disputes from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_miscellaneous from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_medical_genetics from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_marketing from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_management from None to 2
 INFO 2024-07-06 21:22:39,799 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,799 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_machine_learning from None to 2
 INFO 2024-07-06 21:22:39,800 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,800 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_logical_fallacies from None to 2
 INFO 2024-07-06 21:22:39,800 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,800 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_jurisprudence from None to 2
 INFO 2024-07-06 21:22:39,800 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,800 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_international_law from None to 2
 INFO 2024-07-06 21:22:39,800 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,800 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_human_sexuality from None to 2
 INFO 2024-07-06 21:22:39,800 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,800 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_human_aging from None to 2
 INFO 2024-07-06 21:22:39,800 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,800 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_world_history from None to 2
 INFO 2024-07-06 21:22:39,800 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,800 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_us_history from None to 2
 INFO 2024-07-06 21:22:39,800 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,800 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_statistics from None to 2
 INFO 2024-07-06 21:22:39,800 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,800 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_psychology from None to 2
 INFO 2024-07-06 21:22:39,800 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,801 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_physics from None to 2
 INFO 2024-07-06 21:22:39,801 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,801 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_microeconomics from None to 2
 INFO 2024-07-06 21:22:39,801 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,801 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_mathematics from None to 2
 INFO 2024-07-06 21:22:39,801 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,801 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_macroeconomics from None to 2
 INFO 2024-07-06 21:22:39,801 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,801 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_government_and_politics from None to 2
 INFO 2024-07-06 21:22:39,801 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,801 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_geography from None to 2
 INFO 2024-07-06 21:22:39,801 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,801 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_european_history from None to 2
 INFO 2024-07-06 21:22:39,801 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,801 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_computer_science from None to 2
 INFO 2024-07-06 21:22:39,801 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,801 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_chemistry from None to 2
 INFO 2024-07-06 21:22:39,801 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,801 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_high_school_biology from None to 2
 INFO 2024-07-06 21:22:39,802 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,802 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_global_facts from None to 2
 INFO 2024-07-06 21:22:39,802 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,802 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_formal_logic from None to 2
 INFO 2024-07-06 21:22:39,802 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,802 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_elementary_mathematics from None to 2
 INFO 2024-07-06 21:22:39,802 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,802 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_electrical_engineering from None to 2
 INFO 2024-07-06 21:22:39,802 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,802 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_econometrics from None to 2
 INFO 2024-07-06 21:22:39,802 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,802 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_conceptual_physics from None to 2
 INFO 2024-07-06 21:22:39,802 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,802 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_computer_security from None to 2
 INFO 2024-07-06 21:22:39,802 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,802 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_college_physics from None to 2
 INFO 2024-07-06 21:22:39,802 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,802 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_college_medicine from None to 2
 INFO 2024-07-06 21:22:39,802 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,802 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_college_mathematics from None to 2
 INFO 2024-07-06 21:22:39,803 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,803 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_college_computer_science from None to 2
 INFO 2024-07-06 21:22:39,803 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,803 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_college_chemistry from None to 2
 INFO 2024-07-06 21:22:39,803 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,803 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_college_biology from None to 2
 INFO 2024-07-06 21:22:39,803 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,803 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_clinical_knowledge from None to 2
 INFO 2024-07-06 21:22:39,803 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,803 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_business_ethics from None to 2
 INFO 2024-07-06 21:22:39,803 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,803 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_astronomy from None to 2
 INFO 2024-07-06 21:22:39,803 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,803 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_anatomy from None to 2
 INFO 2024-07-06 21:22:39,803 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 WARNING 2024-07-06 21:22:39,803 evaluator.py:251: simple_evaluate Overwriting default num_fewshot of mmlu_abstract_algebra from None to 2
 INFO 2024-07-06 21:22:39,803 evaluator.py:261: simple_evaluate Setting fewshot random generator seed to 1234
 INFO 2024-07-06 21:22:39,808 task.py:411: build_all_requests Building contexts for mmlu_world_religions on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 171/171 [00:00<00:00, 268.69it/s]
 INFO 2024-07-06 21:22:40,453 task.py:411: build_all_requests Building contexts for mmlu_virology on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 271.37it/s]
 INFO 2024-07-06 21:22:41,074 task.py:411: build_all_requests Building contexts for mmlu_us_foreign_policy on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 270.86it/s]
 INFO 2024-07-06 21:22:41,449 task.py:411: build_all_requests Building contexts for mmlu_sociology on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 201/201 [00:00<00:00, 270.04it/s]
 INFO 2024-07-06 21:22:42,203 task.py:411: build_all_requests Building contexts for mmlu_security_studies on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 245/245 [00:00<00:00, 271.31it/s]
 INFO 2024-07-06 21:22:43,119 task.py:411: build_all_requests Building contexts for mmlu_public_relations on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 110/110 [00:00<00:00, 269.99it/s]
 INFO 2024-07-06 21:22:43,532 task.py:411: build_all_requests Building contexts for mmlu_professional_psychology on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 612/612 [00:02<00:00, 270.43it/s]
 INFO 2024-07-06 21:22:45,824 task.py:411: build_all_requests Building contexts for mmlu_professional_medicine on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 272/272 [00:01<00:00, 270.95it/s]
 INFO 2024-07-06 21:22:46,842 task.py:411: build_all_requests Building contexts for mmlu_professional_law on rank 0...
 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1534/1534 [00:05<00:00, 270.82it/s]
 INFO 2024-07-06 21:22:52,579 task.py:411: build_all_requests Building contexts for mmlu_professional_accounting on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 282/282 [00:01<00:00, 271.36it/s]
 INFO 2024-07-06 21:22:53,633 task.py:411: build_all_requests Building contexts for mmlu_prehistory on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 324/324 [00:01<00:00, 270.19it/s]
 INFO 2024-07-06 21:22:54,847 task.py:411: build_all_requests Building contexts for mmlu_philosophy on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 311/311 [00:01<00:00, 271.24it/s]
 INFO 2024-07-06 21:22:56,009 task.py:411: build_all_requests Building contexts for mmlu_nutrition on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 306/306 [00:01<00:00, 271.45it/s]
 INFO 2024-07-06 21:22:57,151 task.py:411: build_all_requests Building contexts for mmlu_moral_scenarios on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 895/895 [00:03<00:00, 270.63it/s]
 INFO 2024-07-06 21:23:00,499 task.py:411: build_all_requests Building contexts for mmlu_moral_disputes on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 346/346 [00:01<00:00, 271.14it/s]
 INFO 2024-07-06 21:23:01,792 task.py:411: build_all_requests Building contexts for mmlu_miscellaneous on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 783/783 [00:02<00:00, 272.23it/s]
 INFO 2024-07-06 21:23:04,705 task.py:411: build_all_requests Building contexts for mmlu_medical_genetics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 270.57it/s]
 INFO 2024-07-06 21:23:05,080 task.py:411: build_all_requests Building contexts for mmlu_marketing on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234/234 [00:00<00:00, 272.93it/s]
 INFO 2024-07-06 21:23:05,948 task.py:411: build_all_requests Building contexts for mmlu_management on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 103/103 [00:00<00:00, 272.42it/s]
 INFO 2024-07-06 21:23:06,332 task.py:411: build_all_requests Building contexts for mmlu_machine_learning on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 112/112 [00:00<00:00, 272.75it/s]
 INFO 2024-07-06 21:23:06,749 task.py:411: build_all_requests Building contexts for mmlu_logical_fallacies on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 163/163 [00:00<00:00, 273.26it/s]
 INFO 2024-07-06 21:23:07,354 task.py:411: build_all_requests Building contexts for mmlu_jurisprudence on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 108/108 [00:00<00:00, 273.50it/s]
 INFO 2024-07-06 21:23:07,754 task.py:411: build_all_requests Building contexts for mmlu_international_law on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 121/121 [00:00<00:00, 272.45it/s]
 INFO 2024-07-06 21:23:08,205 task.py:411: build_all_requests Building contexts for mmlu_human_sexuality on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 131/131 [00:00<00:00, 271.98it/s]
 INFO 2024-07-06 21:23:08,693 task.py:411: build_all_requests Building contexts for mmlu_human_aging on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 223/223 [00:00<00:00, 273.39it/s]
 INFO 2024-07-06 21:23:09,520 task.py:411: build_all_requests Building contexts for mmlu_high_school_world_history on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 237/237 [00:00<00:00, 271.64it/s]
 INFO 2024-07-06 21:23:10,405 task.py:411: build_all_requests Building contexts for mmlu_high_school_us_history on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 204/204 [00:00<00:00, 271.81it/s]
 INFO 2024-07-06 21:23:11,166 task.py:411: build_all_requests Building contexts for mmlu_high_school_statistics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 216/216 [00:00<00:00, 273.51it/s]
 INFO 2024-07-06 21:23:11,967 task.py:411: build_all_requests Building contexts for mmlu_high_school_psychology on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 545/545 [00:02<00:00, 234.54it/s]
 INFO 2024-07-06 21:23:14,315 task.py:411: build_all_requests Building contexts for mmlu_high_school_physics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 151/151 [00:00<00:00, 272.39it/s]
 INFO 2024-07-06 21:23:14,878 task.py:411: build_all_requests Building contexts for mmlu_high_school_microeconomics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 238/238 [00:00<00:00, 271.86it/s]
 INFO 2024-07-06 21:23:15,765 task.py:411: build_all_requests Building contexts for mmlu_high_school_mathematics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270/270 [00:00<00:00, 272.42it/s]
 INFO 2024-07-06 21:23:16,769 task.py:411: build_all_requests Building contexts for mmlu_high_school_macroeconomics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 390/390 [00:01<00:00, 272.05it/s]
 INFO 2024-07-06 21:23:18,221 task.py:411: build_all_requests Building contexts for mmlu_high_school_government_and_politics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 193/193 [00:00<00:00, 272.39it/s]
 INFO 2024-07-06 21:23:18,940 task.py:411: build_all_requests Building contexts for mmlu_high_school_geography on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 198/198 [00:00<00:00, 273.39it/s]
 INFO 2024-07-06 21:23:19,674 task.py:411: build_all_requests Building contexts for mmlu_high_school_european_history on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 165/165 [00:00<00:00, 270.22it/s]
 INFO 2024-07-06 21:23:20,293 task.py:411: build_all_requests Building contexts for mmlu_high_school_computer_science on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 272.90it/s]
 INFO 2024-07-06 21:23:20,665 task.py:411: build_all_requests Building contexts for mmlu_high_school_chemistry on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 203/203 [00:00<00:00, 272.67it/s]
 INFO 2024-07-06 21:23:21,420 task.py:411: build_all_requests Building contexts for mmlu_high_school_biology on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 310/310 [00:01<00:00, 271.73it/s]
 INFO 2024-07-06 21:23:22,576 task.py:411: build_all_requests Building contexts for mmlu_global_facts on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 272.31it/s]
 INFO 2024-07-06 21:23:22,948 task.py:411: build_all_requests Building contexts for mmlu_formal_logic on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 126/126 [00:00<00:00, 272.29it/s]
 INFO 2024-07-06 21:23:23,417 task.py:411: build_all_requests Building contexts for mmlu_elementary_mathematics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 378/378 [00:01<00:00, 271.51it/s]
 INFO 2024-07-06 21:23:24,827 task.py:411: build_all_requests Building contexts for mmlu_electrical_engineering on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 145/145 [00:00<00:00, 273.38it/s]
 INFO 2024-07-06 21:23:25,366 task.py:411: build_all_requests Building contexts for mmlu_econometrics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 114/114 [00:00<00:00, 272.12it/s]
 INFO 2024-07-06 21:23:25,790 task.py:411: build_all_requests Building contexts for mmlu_conceptual_physics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 235/235 [00:00<00:00, 272.13it/s]
 INFO 2024-07-06 21:23:26,666 task.py:411: build_all_requests Building contexts for mmlu_computer_security on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 272.34it/s]
 INFO 2024-07-06 21:23:27,039 task.py:411: build_all_requests Building contexts for mmlu_college_physics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 102/102 [00:00<00:00, 273.16it/s]
 INFO 2024-07-06 21:23:27,417 task.py:411: build_all_requests Building contexts for mmlu_college_medicine on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 173/173 [00:00<00:00, 272.59it/s]
 INFO 2024-07-06 21:23:28,061 task.py:411: build_all_requests Building contexts for mmlu_college_mathematics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 271.40it/s]
 INFO 2024-07-06 21:23:28,435 task.py:411: build_all_requests Building contexts for mmlu_college_computer_science on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 272.53it/s]
 INFO 2024-07-06 21:23:28,807 task.py:411: build_all_requests Building contexts for mmlu_college_chemistry on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 271.35it/s]
 INFO 2024-07-06 21:23:29,181 task.py:411: build_all_requests Building contexts for mmlu_college_biology on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 144/144 [00:00<00:00, 272.95it/s]
 INFO 2024-07-06 21:23:29,716 task.py:411: build_all_requests Building contexts for mmlu_clinical_knowledge on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 265/265 [00:00<00:00, 271.16it/s]
 INFO 2024-07-06 21:23:30,706 task.py:411: build_all_requests Building contexts for mmlu_business_ethics on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 272.60it/s]
 INFO 2024-07-06 21:23:31,079 task.py:411: build_all_requests Building contexts for mmlu_astronomy on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 152/152 [00:00<00:00, 271.11it/s]
 INFO 2024-07-06 21:23:31,647 task.py:411: build_all_requests Building contexts for mmlu_anatomy on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 135/135 [00:00<00:00, 272.49it/s]
 INFO 2024-07-06 21:23:32,150 task.py:411: build_all_requests Building contexts for mmlu_abstract_algebra on rank 0...
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 271.91it/s]
 INFO 2024-07-06 21:23:32,523 evaluator.py:438: evaluate Running loglikelihood requests
 Running loglikelihood requests:   0%|                                                                                                                                                                         | 0/56168 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
 Running loglikelihood requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 56168/56168 [07:19<00:00, 127.89it/s]
 WARNING 2024-07-06 21:32:29,573 huggingface.py:1315: get_model_sha Failed to get model SHA for models/tuned-0701-1954/samples_4992 at revision main. Error: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'models/tuned-0701-1954/samples_4992'. Use `repo_type` argument if needed.
 fatal: not a git repository (or any parent up to mount point /)
 Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
 # KNOWLEDGE EVALUATION REPORT

 ## MODEL
 models/tuned-0701-1954/samples_4992

 ### AVERAGE:
 0.48 (across 57)

 ### SCORES:
 mmlu_abstract_algebra - 0.33
 mmlu_anatomy - 0.47
 mmlu_astronomy - 0.48
 mmlu_business_ethics - 0.53
 mmlu_clinical_knowledge - 0.49
 mmlu_college_biology - 0.43
 mmlu_college_chemistry - 0.41
 mmlu_college_computer_science - 0.39
 mmlu_college_mathematics - 0.36
 mmlu_college_medicine - 0.52
 mmlu_college_physics - 0.22
 mmlu_computer_security - 0.57
 mmlu_conceptual_physics - 0.44
 mmlu_econometrics - 0.25
 mmlu_electrical_engineering - 0.5
 mmlu_elementary_mathematics - 0.26
 mmlu_formal_logic - 0.31
 mmlu_global_facts - 0.32
 mmlu_high_school_biology - 0.54
 mmlu_high_school_chemistry - 0.41
 mmlu_high_school_computer_science - 0.47
 mmlu_high_school_european_history - 0.61
 mmlu_high_school_geography - 0.58
 mmlu_high_school_government_and_politics - 0.64
 mmlu_high_school_macroeconomics - 0.48
 mmlu_high_school_mathematics - 0.26
 mmlu_high_school_microeconomics - 0.48
 mmlu_high_school_physics - 0.32
 mmlu_high_school_psychology - 0.67
 mmlu_high_school_statistics - 0.43
 mmlu_high_school_us_history - 0.6
 mmlu_high_school_world_history - 0.65
 mmlu_human_aging - 0.58
 mmlu_human_sexuality - 0.59
 mmlu_international_law - 0.58
 mmlu_jurisprudence - 0.5
 mmlu_logical_fallacies - 0.53
 mmlu_machine_learning - 0.25
 mmlu_management - 0.65
 mmlu_marketing - 0.71
 mmlu_medical_genetics - 0.49
 mmlu_miscellaneous - 0.67
 mmlu_moral_disputes - 0.5
 mmlu_moral_scenarios - 0.25
 mmlu_nutrition - 0.52
 mmlu_philosophy - 0.55
 mmlu_prehistory - 0.51
 mmlu_professional_accounting - 0.35
 mmlu_professional_law - 0.38
 mmlu_professional_medicine - 0.36
 mmlu_professional_psychology - 0.46
 mmlu_public_relations - 0.63
 mmlu_security_studies - 0.56
 mmlu_sociology - 0.68
 mmlu_us_foreign_policy - 0.72
 mmlu_virology - 0.42
 mmlu_world_religions - 0.59