Last active
August 11, 2024 14:08
-
-
Save relyt0925/d36c19b9fd2fbd9bf39a59196d70fd07 to your computer and use it in GitHub Desktop.
ilab mt_bench eval log (ilab model evaluate --benchmark mt_bench --model /instructlab/models/tuned-0701-1954/samples_4992 --judge-model /instructlab/models/prometheus-eval/prometheus-8x7b-v2.0 --taxonomy-path /instructlab/taxonomy/ --output-dir /instructlab/mtbench)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| INFO 2024-07-05 18:53:02,883 utils.py:145: _init_num_threads Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. | |
| INFO 2024-07-05 18:53:02,883 utils.py:148: _init_num_threads Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16. | |
| INFO 2024-07-05 18:53:02,883 utils.py:161: _init_num_threads NumExpr defaulting to 16 threads. | |
| INFO 2024-07-05 18:53:03,050 config.py:58: <module> PyTorch version 2.3.1 available. | |
| Generating answers... | |
| INFO 2024-07-05 18:53:12,971 vllm.py:148: run_vllm vLLM starting up on pid 212 at http://127.0.0.1:58173/v1 | |
| 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [01:02<00:00, 1.27it/s] | |
| Evaluating answers... | |
| INFO 2024-07-05 18:56:08,297 vllm.py:148: run_vllm vLLM starting up on pid 255 at http://127.0.0.1:48517/v1 | |
| 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160/160 [01:11<00:00, 2.23it/s] | |
| # SKILL EVALUATION REPORT | |
| ## MODEL | |
| /instructlab/models/tuned-0701-1954/samples_4992 | |
| ### AVERAGE: | |
| 7.92 (across 85) | |
| ### TURN ONE: | |
| 8.39 | |
| ### TURN TWO: | |
| 0.4 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment