Last active
August 11, 2024 14:09
-
-
Save relyt0925/4fd6480a9b6f828f1c381f6dfecf67d3 to your computer and use it in GitHub Desktop.
mt_bench_branch.log (ilab model evaluate --benchmark mt_bench_branch --model /instructlab/models/tuned-0701-1954/samples_4992 --judge-model /instructlab/models/prometheus-eval/prometheus-8x7b-v2.0 --taxonomy-path /instructlab/taxonomy --output-dir /instructlab/mtbench --base-model /instructlab/models/ibm/granite-7b-base --branch main --base-bran…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
INFO 2024-07-05 19:34:55,630 utils.py:145: _init_num_threads Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. | |
INFO 2024-07-05 19:34:55,630 utils.py:148: _init_num_threads Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16. | |
INFO 2024-07-05 19:34:55,630 utils.py:161: _init_num_threads NumExpr defaulting to 16 threads. | |
INFO 2024-07-05 19:34:55,789 config.py:58: <module> PyTorch version 2.3.1 available. | |
Generating questions and reference answers from qna files for branch main... | |
INFO 2024-07-05 19:35:02,464 vllm.py:148: run_vllm vLLM starting up on pid 212 at http://127.0.0.1:48895/v1 | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 148/148 [00:00<00:00, 169.53it/s] | |
generated 416 questions | |
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 416/416 [01:26<00:00, 4.83it/s] | |
Generating questions and reference answers from qna files for branch main... | |
INFO 2024-07-05 19:38:12,969 vllm.py:148: run_vllm vLLM starting up on pid 258 at http://127.0.0.1:47763/v1 | |
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 148/148 [00:00<00:00, 170.16it/s] | |
generated 416 questions | |
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 416/416 [01:26<00:00, 4.82it/s] | |
INFO 2024-07-05 19:42:41,432 vllm.py:148: run_vllm vLLM starting up on pid 301 at http://127.0.0.1:47965/v1 | |
Evaluating answers for branch main... | |
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 832/832 [02:57<00:00, 4.69it/s] | |
Evaluating answers for branch main... | |
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 832/832 [02:42<00:00, 5.12it/s] | |
# SKILL EVALUATION REPORT | |
## BASE MODEL | |
/instructlab/models/ibm/granite-7b-base | |
## MODEL | |
/instructlab/models/tuned-0701-1954/samples_4992 | |
### IMPROVEMENTS: | |
1. compositional_skills/extraction/abstractive/abstract/qna.yaml (+4.5) | |
2. compositional_skills/extraction/inference/qualitative/sentiment/qna.yaml (+3.0) | |
3. compositional_skills/extraction/information/named_entities/dates_and_events/qna.yaml (+2.0) | |
4. compositional_skills/extraction/annual_report/csv/qna.yaml (+1.17) | |
5. compositional_skills/extraction/information/named_entities/places/qna.yaml (+1.0) | |
6. compositional_skills/extraction/receipt/plain_text/qna.yaml (+0.83) | |
7. compositional_skills/extraction/fda_filing/markdown/qna.yaml (+0.83) | |
8. foundational_skills/reasoning/theory_of_mind/qna.yaml (+0.53) | |
9. compositional_skills/extraction/services_agreement/bullet_points/qna.yaml (+0.5) | |
10. compositional_skills/writing/freeform/technical/user_manual/qna.yaml (+0.5) | |
11. compositional_skills/extraction/abstractive/outline/qna.yaml (+0.5) | |
12. compositional_skills/writing/grounded/editing/spelling/qna.yaml (+0.5) | |
13. compositional_skills/extraction/commercial_lease_agreement/bullet_points/qna.yaml (+0.5) | |
14. compositional_skills/extraction/annual_report/reasoning/qna.yaml (+0.5) | |
15. compositional_skills/writing/freeform/brainstorming/refute_claim/qna.yaml (+0.5) | |
16. compositional_skills/extraction/invoice/bullet_points/qna.yaml (+0.5) | |
17. compositional_skills/extraction/abstractive/main_takeaway/qna.yaml (+0.5) | |
18. compositional_skills/extraction/services_agreement/plain_text/qna.yaml (+0.42) | |
19. foundational_skills/reasoning/logical_reasoning/causal/qna.yaml (+0.33) | |
20. foundational_skills/reasoning/linguistics_reasoning/object_identification/qna.yaml (+0.33) | |
21. compositional_skills/extraction/annual_report/plain_text/qna.yaml (+0.33) | |
22. compositional_skills/linguistics/summarization/list_of_sentences/qna.yaml (+0.33) | |
23. compositional_skills/extraction/email/reasoning/qna.yaml (+0.33) | |
24. compositional_skills/extraction/fda_filing/reasoning/qna.yaml (+0.33) | |
25. compositional_skills/extraction/technical_paper/equations/csv/qna.yaml (+0.25) | |
26. compositional_skills/linguistics/bullet_lists/qna.yaml (+0.2) | |
27. foundational_skills/reasoning/mathematical_reasoning/qna.yaml (+0.17) | |
28. compositional_skills/extraction/technical_paper/equations/plain_text/qna.yaml (+0.17) | |
29. compositional_skills/STEM/math/mensurational/qna.yaml (+0.17) | |
30. compositional_skills/STEM/math/arithmetic_reasoning/qna.yaml (+0.17) | |
31. compositional_skills/extraction/annual_report/bullet_points/qna.yaml (+0.17) | |
32. compositional_skills/linguistics/rhyming_words/qna.yaml (+0.17) | |
33. compositional_skills/linguistics/organize_lists/qna.yaml (+0.17) | |
34. compositional_skills/roleplay/explain_like_you_are/non_fictional/popular_personalities/qna.yaml (+0.06) | |
### REGRESSIONS: | |
1. compositional_skills/writing/grounded/summarization/wiki_insights/five_point/qna.yaml (-2.5) | |
2. compositional_skills/extraction/email/markdown/qna.yaml (-1.33) | |
3. compositional_skills/extraction/annual_report/markdown/qna.yaml (-1.0) | |
4. compositional_skills/STEM/science/geography/qna.yaml (-1.0) | |
5. compositional_skills/extraction/invoice/reasoning/qna.yaml (-0.83) | |
6. compositional_skills/extraction/commercial_lease_agreement/csv/qna.yaml (-0.83) | |
7. compositional_skills/extraction/invoice/csv/qna.yaml (-0.8) | |
8. compositional_skills/general/tables/editing/combining_altering/qna.yaml (-0.67) | |
9. compositional_skills/extraction/fda_filing/plain_text/qna.yaml (-0.67) | |
10. compositional_skills/extraction/receipt/bullet_points/qna.yaml (-0.6) | |
11. compositional_skills/extraction/commercial_lease_agreement/reasoning/qna.yaml (-0.5) | |
12. compositional_skills/STEM/math/time_series/qna.yaml (-0.37) | |
13. compositional_skills/general/tables/empty/qna.yaml (-0.33) | |
14. compositional_skills/extraction/technical_paper/tables/plain_text/qna.yaml (-0.33) | |
15. compositional_skills/extraction/technical_paper/abstract/reasoning/qna.yaml (-0.33) | |
16. compositional_skills/writing/freeform/jokes/puns/general/qna.yaml (-0.31) | |
17. compositional_skills/extraction/commercial_lease_agreement/plain_text/qna.yaml (-0.27) | |
18. foundational_skills/reasoning/logical_reasoning/general/qna.yaml (-0.23) | |
19. knowledge/textbook/history/ibm_history/qna.yaml (-0.17) | |
20. compositional_skills/extraction/receipt/reasoning/qna.yaml (-0.17) | |
21. compositional_skills/extraction/technical_paper/equations/reasoning/qna.yaml (-0.17) | |
22. compositional_skills/STEM/math/area/qna.yaml (-0.17) | |
23. compositional_skills/writing/freeform/grammar/basic_grammer_tests/qna.yaml (-0.17) | |
24. compositional_skills/linguistics/summarization/ignore_pii/qna.yaml (-0.17) | |
25. compositional_skills/extraction/technical_paper/abstract/markdown/qna.yaml (-0.17) | |
26. compositional_skills/extraction/technical_paper/abstract/plain_text/qna.yaml (-0.17) | |
27. compositional_skills/extraction/fda_filing/bullet_points/qna.yaml (-0.17) | |
28. compositional_skills/STEM/math/reasoning/qna.yaml (-0.17) | |
29. compositional_skills/roleplay/explain_like_you_are/abstract/qna.yaml (-0.1) | |
30. compositional_skills/STEM/math/distance_conversion/qna.yaml (-0.08) | |
31. compositional_skills/extraction/technical_paper/equations/markdown/qna.yaml (-0.08) | |
### NO CHANGE: | |
1. compositional_skills/extraction/invoice/plain_text/qna.yaml | |
2. compositional_skills/extraction/receipt/csv/qna.yaml | |
3. compositional_skills/extraction/technical_paper/tables/reasoning/qna.yaml | |
4. compositional_skills/roleplay/explain_like_you_are/non_fictional/historical_figures/qna.yaml | |
5. compositional_skills/linguistics/classification/agent_classification/qna.yaml | |
6. compositional_skills/writing/freeform/poetry/ballad/qna.yaml | |
7. compositional_skills/writing/freeform/emoji/qna.yaml | |
8. compositional_skills/linguistics/reversing_string/qna.yaml | |
9. compositional_skills/writing/freeform/technical/proposal/qna.yaml | |
10. compositional_skills/writing/freeform/poetry/ode/qna.yaml | |
11. compositional_skills/extraction/technical_paper/equations/bullet_points/qna.yaml | |
12. compositional_skills/linguistics/complete_common_expressions/qna.yaml | |
13. compositional_skills/extraction/abstractive/title/qna.yaml | |
14. compositional_skills/linguistics/word_gen/qna.yaml | |
15. compositional_skills/roleplay/explain_like_i_am/primary_schooler/qna.yaml | |
16. compositional_skills/STEM/math/pattern_recognition/qna.yaml | |
17. compositional_skills/STEM/science/units_conversion/temperature_conversion/qna.yaml | |
18. compositional_skills/extraction/technical_paper/abstract/csv/qna.yaml | |
19. compositional_skills/writing/freeform/debate/qna.yaml | |
20. compositional_skills/writing/freeform/riddles/qna.yaml | |
21. compositional_skills/linguistics/jumbled_sentences/qna.yaml | |
22. compositional_skills/STEM/math/arithmetic_w_grammar/qna.yaml | |
23. compositional_skills/writing/grounded/summarization/wiki_insights/high_level_outline/qna.yaml | |
24. compositional_skills/writing/grounded/meeting_insights/action_items/qna.yaml | |
25. compositional_skills/roleplay/explain_like_you_are/fictional/movies/qna.yaml | |
26. compositional_skills/extraction/technical_paper/abstract/bullet_points/qna.yaml | |
27. foundational_skills/reasoning/temporal_reasoning/qna.yaml | |
28. knowledge/technical_manual/ibm_redbooks/qna.yaml | |
29. compositional_skills/extraction/email/bullet_points/qna.yaml | |
30. compositional_skills/STEM/science/units_conversion/distance_conversion/qna.yaml | |
31. compositional_skills/extraction/services_agreement/reasoning/qna.yaml | |
32. compositional_skills/linguistics/pattern_recognition/qna.yaml | |
33. compositional_skills/writing/freeform/poetry/freeverse/qna.yaml | |
34. compositional_skills/general/tables/editing/add_remove/qna.yaml | |
35. compositional_skills/extraction/inference/quantitative/asciidoc/tables/qna.yaml | |
36. compositional_skills/writing/grounded/meeting_insights/corporate_email/qna.yaml | |
37. compositional_skills/extraction/fda_filing/csv/qna.yaml | |
38. foundational_skills/reasoning/unconventional_reasoning/lower_score_wins/qna.yaml | |
39. compositional_skills/writing/freeform/poetry/narrative_poetry/qna.yaml | |
40. foundational_skills/reasoning/common_sense_reasoning/qna.yaml | |
41. compositional_skills/extraction/email/plain_text/qna.yaml | |
42. compositional_skills/writing/grounded/summarization/wiki_insights/concise/qna.yaml | |
43. compositional_skills/writing/freeform/social_media/linkedin/qna.yaml | |
44. compositional_skills/writing/freeform/poetry/epic_poetry/qna.yaml | |
45. compositional_skills/extraction/commercial_lease_agreement/markdown/qna.yaml | |
46. compositional_skills/roleplay/explain_like_i_am/graduate/qna.yaml | |
47. compositional_skills/roleplay/explain_like_you_are/fictional/video_games/qna.yaml | |
48. compositional_skills/roleplay/explain_like_you_are/fictional/tv_shows/qna.yaml | |
49. compositional_skills/writing/freeform/technical/product_description/qna.yaml | |
50. foundational_skills/reasoning/linguistics_reasoning/logical_sequence_of_words/qna.yaml | |
51. compositional_skills/extraction/technical_paper/tables/bullet_points/qna.yaml | |
52. compositional_skills/writing/freeform/technical/report/qna.yaml | |
53. compositional_skills/writing/freeform/legal/agreement/qna.yaml | |
54. compositional_skills/writing/freeform/social_media/instagram/qna.yaml | |
55. foundational_skills/reasoning/logical_reasoning/tabular/qna.yaml | |
56. compositional_skills/extraction/invoice/markdown/qna.yaml | |
57. foundational_skills/reasoning/linguistics_reasoning/odd_one_out/qna.yaml | |
58. compositional_skills/extraction/receipt/markdown/qna.yaml | |
59. compositional_skills/extraction/technical_paper/tables/csv/qna.yaml | |
60. compositional_skills/writing/grounded/meeting_insights/executive_summaries/qna.yaml | |
61. compositional_skills/writing/freeform/social_media/twitter/qna.yaml | |
62. compositional_skills/writing/freeform/poetry/haiku/qna.yaml | |
63. compositional_skills/writing/freeform/poetry/sonnet/qna.yaml | |
64. compositional_skills/writing/freeform/prose/articles/qna.yaml | |
65. compositional_skills/writing/grounded/summarization/wiki_insights/detailed/qna.yaml | |
66. compositional_skills/writing/freeform/prose/screenplay/qna.yaml | |
67. compositional_skills/extraction/abstractive/key_points/qna.yaml | |
68. compositional_skills/writing/freeform/brainstorming/idea_generation/qna.yaml | |
69. compositional_skills/writing/grounded/summarization/wiki_insights/one_line/qna.yaml | |
70. compositional_skills/extraction/information/named_entities/person_names/qna.yaml | |
71. compositional_skills/writing/freeform/poetry/limerick/qna.yaml | |
72. compositional_skills/writing/grounded/meeting_insights/minutes_of_meeting/qna.yaml | |
73. compositional_skills/writing/grounded/editing/grammar/qna.yaml | |
74. compositional_skills/writing/freeform/brainstorming/support_claim/qna.yaml | |
75. compositional_skills/writing/grounded/editing/punctuation/qna.yaml | |
76. compositional_skills/extraction/inference/quantitative/table_analaysis/qna.yaml | |
77. compositional_skills/writing/freeform/technical/guide/qna.yaml | |
78. compositional_skills/writing/freeform/legal/contracts/qna.yaml | |
79. compositional_skills/writing/freeform/prose/stories/qna.yaml | |
80. compositional_skills/writing/freeform/technical/specification/qna.yaml | |
81. compositional_skills/writing/freeform/prose/emails/formal/qna.yaml | |
82. compositional_skills/writing/freeform/social_media/facebook/qna.yaml | |
83. compositional_skills/writing/freeform/prose/emails/informal/qna.yaml |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment