Last active
August 11, 2024 14:09
-
-
Save relyt0925/4fd6480a9b6f828f1c381f6dfecf67d3 to your computer and use it in GitHub Desktop.
mt_bench_branch.log (ilab model evaluate --benchmark mt_bench_branch --model /instructlab/models/tuned-0701-1954/samples_4992 --judge-model /instructlab/models/prometheus-eval/prometheus-8x7b-v2.0 --taxonomy-path /instructlab/taxonomy --output-dir /instructlab/mtbench --base-model /instructlab/models/ibm/granite-7b-base --branch main --base-bran…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| INFO 2024-07-05 19:34:55,630 utils.py:145: _init_num_threads Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. | |
| INFO 2024-07-05 19:34:55,630 utils.py:148: _init_num_threads Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16. | |
| INFO 2024-07-05 19:34:55,630 utils.py:161: _init_num_threads NumExpr defaulting to 16 threads. | |
| INFO 2024-07-05 19:34:55,789 config.py:58: <module> PyTorch version 2.3.1 available. | |
| Generating questions and reference answers from qna files for branch main... | |
| INFO 2024-07-05 19:35:02,464 vllm.py:148: run_vllm vLLM starting up on pid 212 at http://127.0.0.1:48895/v1 | |
| 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 148/148 [00:00<00:00, 169.53it/s] | |
| generated 416 questions | |
| 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 416/416 [01:26<00:00, 4.83it/s] | |
| Generating questions and reference answers from qna files for branch main... | |
| INFO 2024-07-05 19:38:12,969 vllm.py:148: run_vllm vLLM starting up on pid 258 at http://127.0.0.1:47763/v1 | |
| 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 148/148 [00:00<00:00, 170.16it/s] | |
| generated 416 questions | |
| 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 416/416 [01:26<00:00, 4.82it/s] | |
| INFO 2024-07-05 19:42:41,432 vllm.py:148: run_vllm vLLM starting up on pid 301 at http://127.0.0.1:47965/v1 | |
| Evaluating answers for branch main... | |
| 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 832/832 [02:57<00:00, 4.69it/s] | |
| Evaluating answers for branch main... | |
| 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 832/832 [02:42<00:00, 5.12it/s] | |
| # SKILL EVALUATION REPORT | |
| ## BASE MODEL | |
| /instructlab/models/ibm/granite-7b-base | |
| ## MODEL | |
| /instructlab/models/tuned-0701-1954/samples_4992 | |
| ### IMPROVEMENTS: | |
| 1. compositional_skills/extraction/abstractive/abstract/qna.yaml (+4.5) | |
| 2. compositional_skills/extraction/inference/qualitative/sentiment/qna.yaml (+3.0) | |
| 3. compositional_skills/extraction/information/named_entities/dates_and_events/qna.yaml (+2.0) | |
| 4. compositional_skills/extraction/annual_report/csv/qna.yaml (+1.17) | |
| 5. compositional_skills/extraction/information/named_entities/places/qna.yaml (+1.0) | |
| 6. compositional_skills/extraction/receipt/plain_text/qna.yaml (+0.83) | |
| 7. compositional_skills/extraction/fda_filing/markdown/qna.yaml (+0.83) | |
| 8. foundational_skills/reasoning/theory_of_mind/qna.yaml (+0.53) | |
| 9. compositional_skills/extraction/services_agreement/bullet_points/qna.yaml (+0.5) | |
| 10. compositional_skills/writing/freeform/technical/user_manual/qna.yaml (+0.5) | |
| 11. compositional_skills/extraction/abstractive/outline/qna.yaml (+0.5) | |
| 12. compositional_skills/writing/grounded/editing/spelling/qna.yaml (+0.5) | |
| 13. compositional_skills/extraction/commercial_lease_agreement/bullet_points/qna.yaml (+0.5) | |
| 14. compositional_skills/extraction/annual_report/reasoning/qna.yaml (+0.5) | |
| 15. compositional_skills/writing/freeform/brainstorming/refute_claim/qna.yaml (+0.5) | |
| 16. compositional_skills/extraction/invoice/bullet_points/qna.yaml (+0.5) | |
| 17. compositional_skills/extraction/abstractive/main_takeaway/qna.yaml (+0.5) | |
| 18. compositional_skills/extraction/services_agreement/plain_text/qna.yaml (+0.42) | |
| 19. foundational_skills/reasoning/logical_reasoning/causal/qna.yaml (+0.33) | |
| 20. foundational_skills/reasoning/linguistics_reasoning/object_identification/qna.yaml (+0.33) | |
| 21. compositional_skills/extraction/annual_report/plain_text/qna.yaml (+0.33) | |
| 22. compositional_skills/linguistics/summarization/list_of_sentences/qna.yaml (+0.33) | |
| 23. compositional_skills/extraction/email/reasoning/qna.yaml (+0.33) | |
| 24. compositional_skills/extraction/fda_filing/reasoning/qna.yaml (+0.33) | |
| 25. compositional_skills/extraction/technical_paper/equations/csv/qna.yaml (+0.25) | |
| 26. compositional_skills/linguistics/bullet_lists/qna.yaml (+0.2) | |
| 27. foundational_skills/reasoning/mathematical_reasoning/qna.yaml (+0.17) | |
| 28. compositional_skills/extraction/technical_paper/equations/plain_text/qna.yaml (+0.17) | |
| 29. compositional_skills/STEM/math/mensurational/qna.yaml (+0.17) | |
| 30. compositional_skills/STEM/math/arithmetic_reasoning/qna.yaml (+0.17) | |
| 31. compositional_skills/extraction/annual_report/bullet_points/qna.yaml (+0.17) | |
| 32. compositional_skills/linguistics/rhyming_words/qna.yaml (+0.17) | |
| 33. compositional_skills/linguistics/organize_lists/qna.yaml (+0.17) | |
| 34. compositional_skills/roleplay/explain_like_you_are/non_fictional/popular_personalities/qna.yaml (+0.06) | |
| ### REGRESSIONS: | |
| 1. compositional_skills/writing/grounded/summarization/wiki_insights/five_point/qna.yaml (-2.5) | |
| 2. compositional_skills/extraction/email/markdown/qna.yaml (-1.33) | |
| 3. compositional_skills/extraction/annual_report/markdown/qna.yaml (-1.0) | |
| 4. compositional_skills/STEM/science/geography/qna.yaml (-1.0) | |
| 5. compositional_skills/extraction/invoice/reasoning/qna.yaml (-0.83) | |
| 6. compositional_skills/extraction/commercial_lease_agreement/csv/qna.yaml (-0.83) | |
| 7. compositional_skills/extraction/invoice/csv/qna.yaml (-0.8) | |
| 8. compositional_skills/general/tables/editing/combining_altering/qna.yaml (-0.67) | |
| 9. compositional_skills/extraction/fda_filing/plain_text/qna.yaml (-0.67) | |
| 10. compositional_skills/extraction/receipt/bullet_points/qna.yaml (-0.6) | |
| 11. compositional_skills/extraction/commercial_lease_agreement/reasoning/qna.yaml (-0.5) | |
| 12. compositional_skills/STEM/math/time_series/qna.yaml (-0.37) | |
| 13. compositional_skills/general/tables/empty/qna.yaml (-0.33) | |
| 14. compositional_skills/extraction/technical_paper/tables/plain_text/qna.yaml (-0.33) | |
| 15. compositional_skills/extraction/technical_paper/abstract/reasoning/qna.yaml (-0.33) | |
| 16. compositional_skills/writing/freeform/jokes/puns/general/qna.yaml (-0.31) | |
| 17. compositional_skills/extraction/commercial_lease_agreement/plain_text/qna.yaml (-0.27) | |
| 18. foundational_skills/reasoning/logical_reasoning/general/qna.yaml (-0.23) | |
| 19. knowledge/textbook/history/ibm_history/qna.yaml (-0.17) | |
| 20. compositional_skills/extraction/receipt/reasoning/qna.yaml (-0.17) | |
| 21. compositional_skills/extraction/technical_paper/equations/reasoning/qna.yaml (-0.17) | |
| 22. compositional_skills/STEM/math/area/qna.yaml (-0.17) | |
| 23. compositional_skills/writing/freeform/grammar/basic_grammer_tests/qna.yaml (-0.17) | |
| 24. compositional_skills/linguistics/summarization/ignore_pii/qna.yaml (-0.17) | |
| 25. compositional_skills/extraction/technical_paper/abstract/markdown/qna.yaml (-0.17) | |
| 26. compositional_skills/extraction/technical_paper/abstract/plain_text/qna.yaml (-0.17) | |
| 27. compositional_skills/extraction/fda_filing/bullet_points/qna.yaml (-0.17) | |
| 28. compositional_skills/STEM/math/reasoning/qna.yaml (-0.17) | |
| 29. compositional_skills/roleplay/explain_like_you_are/abstract/qna.yaml (-0.1) | |
| 30. compositional_skills/STEM/math/distance_conversion/qna.yaml (-0.08) | |
| 31. compositional_skills/extraction/technical_paper/equations/markdown/qna.yaml (-0.08) | |
| ### NO CHANGE: | |
| 1. compositional_skills/extraction/invoice/plain_text/qna.yaml | |
| 2. compositional_skills/extraction/receipt/csv/qna.yaml | |
| 3. compositional_skills/extraction/technical_paper/tables/reasoning/qna.yaml | |
| 4. compositional_skills/roleplay/explain_like_you_are/non_fictional/historical_figures/qna.yaml | |
| 5. compositional_skills/linguistics/classification/agent_classification/qna.yaml | |
| 6. compositional_skills/writing/freeform/poetry/ballad/qna.yaml | |
| 7. compositional_skills/writing/freeform/emoji/qna.yaml | |
| 8. compositional_skills/linguistics/reversing_string/qna.yaml | |
| 9. compositional_skills/writing/freeform/technical/proposal/qna.yaml | |
| 10. compositional_skills/writing/freeform/poetry/ode/qna.yaml | |
| 11. compositional_skills/extraction/technical_paper/equations/bullet_points/qna.yaml | |
| 12. compositional_skills/linguistics/complete_common_expressions/qna.yaml | |
| 13. compositional_skills/extraction/abstractive/title/qna.yaml | |
| 14. compositional_skills/linguistics/word_gen/qna.yaml | |
| 15. compositional_skills/roleplay/explain_like_i_am/primary_schooler/qna.yaml | |
| 16. compositional_skills/STEM/math/pattern_recognition/qna.yaml | |
| 17. compositional_skills/STEM/science/units_conversion/temperature_conversion/qna.yaml | |
| 18. compositional_skills/extraction/technical_paper/abstract/csv/qna.yaml | |
| 19. compositional_skills/writing/freeform/debate/qna.yaml | |
| 20. compositional_skills/writing/freeform/riddles/qna.yaml | |
| 21. compositional_skills/linguistics/jumbled_sentences/qna.yaml | |
| 22. compositional_skills/STEM/math/arithmetic_w_grammar/qna.yaml | |
| 23. compositional_skills/writing/grounded/summarization/wiki_insights/high_level_outline/qna.yaml | |
| 24. compositional_skills/writing/grounded/meeting_insights/action_items/qna.yaml | |
| 25. compositional_skills/roleplay/explain_like_you_are/fictional/movies/qna.yaml | |
| 26. compositional_skills/extraction/technical_paper/abstract/bullet_points/qna.yaml | |
| 27. foundational_skills/reasoning/temporal_reasoning/qna.yaml | |
| 28. knowledge/technical_manual/ibm_redbooks/qna.yaml | |
| 29. compositional_skills/extraction/email/bullet_points/qna.yaml | |
| 30. compositional_skills/STEM/science/units_conversion/distance_conversion/qna.yaml | |
| 31. compositional_skills/extraction/services_agreement/reasoning/qna.yaml | |
| 32. compositional_skills/linguistics/pattern_recognition/qna.yaml | |
| 33. compositional_skills/writing/freeform/poetry/freeverse/qna.yaml | |
| 34. compositional_skills/general/tables/editing/add_remove/qna.yaml | |
| 35. compositional_skills/extraction/inference/quantitative/asciidoc/tables/qna.yaml | |
| 36. compositional_skills/writing/grounded/meeting_insights/corporate_email/qna.yaml | |
| 37. compositional_skills/extraction/fda_filing/csv/qna.yaml | |
| 38. foundational_skills/reasoning/unconventional_reasoning/lower_score_wins/qna.yaml | |
| 39. compositional_skills/writing/freeform/poetry/narrative_poetry/qna.yaml | |
| 40. foundational_skills/reasoning/common_sense_reasoning/qna.yaml | |
| 41. compositional_skills/extraction/email/plain_text/qna.yaml | |
| 42. compositional_skills/writing/grounded/summarization/wiki_insights/concise/qna.yaml | |
| 43. compositional_skills/writing/freeform/social_media/linkedin/qna.yaml | |
| 44. compositional_skills/writing/freeform/poetry/epic_poetry/qna.yaml | |
| 45. compositional_skills/extraction/commercial_lease_agreement/markdown/qna.yaml | |
| 46. compositional_skills/roleplay/explain_like_i_am/graduate/qna.yaml | |
| 47. compositional_skills/roleplay/explain_like_you_are/fictional/video_games/qna.yaml | |
| 48. compositional_skills/roleplay/explain_like_you_are/fictional/tv_shows/qna.yaml | |
| 49. compositional_skills/writing/freeform/technical/product_description/qna.yaml | |
| 50. foundational_skills/reasoning/linguistics_reasoning/logical_sequence_of_words/qna.yaml | |
| 51. compositional_skills/extraction/technical_paper/tables/bullet_points/qna.yaml | |
| 52. compositional_skills/writing/freeform/technical/report/qna.yaml | |
| 53. compositional_skills/writing/freeform/legal/agreement/qna.yaml | |
| 54. compositional_skills/writing/freeform/social_media/instagram/qna.yaml | |
| 55. foundational_skills/reasoning/logical_reasoning/tabular/qna.yaml | |
| 56. compositional_skills/extraction/invoice/markdown/qna.yaml | |
| 57. foundational_skills/reasoning/linguistics_reasoning/odd_one_out/qna.yaml | |
| 58. compositional_skills/extraction/receipt/markdown/qna.yaml | |
| 59. compositional_skills/extraction/technical_paper/tables/csv/qna.yaml | |
| 60. compositional_skills/writing/grounded/meeting_insights/executive_summaries/qna.yaml | |
| 61. compositional_skills/writing/freeform/social_media/twitter/qna.yaml | |
| 62. compositional_skills/writing/freeform/poetry/haiku/qna.yaml | |
| 63. compositional_skills/writing/freeform/poetry/sonnet/qna.yaml | |
| 64. compositional_skills/writing/freeform/prose/articles/qna.yaml | |
| 65. compositional_skills/writing/grounded/summarization/wiki_insights/detailed/qna.yaml | |
| 66. compositional_skills/writing/freeform/prose/screenplay/qna.yaml | |
| 67. compositional_skills/extraction/abstractive/key_points/qna.yaml | |
| 68. compositional_skills/writing/freeform/brainstorming/idea_generation/qna.yaml | |
| 69. compositional_skills/writing/grounded/summarization/wiki_insights/one_line/qna.yaml | |
| 70. compositional_skills/extraction/information/named_entities/person_names/qna.yaml | |
| 71. compositional_skills/writing/freeform/poetry/limerick/qna.yaml | |
| 72. compositional_skills/writing/grounded/meeting_insights/minutes_of_meeting/qna.yaml | |
| 73. compositional_skills/writing/grounded/editing/grammar/qna.yaml | |
| 74. compositional_skills/writing/freeform/brainstorming/support_claim/qna.yaml | |
| 75. compositional_skills/writing/grounded/editing/punctuation/qna.yaml | |
| 76. compositional_skills/extraction/inference/quantitative/table_analaysis/qna.yaml | |
| 77. compositional_skills/writing/freeform/technical/guide/qna.yaml | |
| 78. compositional_skills/writing/freeform/legal/contracts/qna.yaml | |
| 79. compositional_skills/writing/freeform/prose/stories/qna.yaml | |
| 80. compositional_skills/writing/freeform/technical/specification/qna.yaml | |
| 81. compositional_skills/writing/freeform/prose/emails/formal/qna.yaml | |
| 82. compositional_skills/writing/freeform/social_media/facebook/qna.yaml | |
| 83. compositional_skills/writing/freeform/prose/emails/informal/qna.yaml |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment