Skip to content

Instantly share code, notes, and snippets.

View relyt0925's full-sized avatar

Tyler Lisowski relyt0925

View GitHub Profile
[root@tyler-rhel-newimage instructlab]# /root/ilab model train --data-path /var/instructlabbigdisk/instructlab/generateddata/messages_Mixtral-8x7B-Instruct-v0_2024-07-27T04_27_23.jsonl --model-path /var/instructlabbigdisk/instructlab/models/ibm-granite/granite-7b-base/ --ckpt-output-dir /var/instructlabbigdisk/instructlab/knowledgecheckpoints/ --device cuda --gpus 8 --max-batch-len 1 --effective-batch-size 8 --save-samples 46
[2024-07-27 04:38:32,852] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 b
[root@tyler-rhel-newimage instructlab]# /root/ilab model evaluate --benchmark mt_bench_branch --model /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_1056/ --judge-model /var/instructlabbigdisk/instructlab/models/prometheus-eval/prometheus-8x7b-v2.0/ --base-model /var/instructlabbigdisk/instructlab/models/ibm-granite/granite-7b-base/ --output-dir /var/instructlabbigdisk/instructlab/evaltracker/skillscheckpoints/samples_1056/ --gpus 8 --backend vllm --enable-serving-output --taxonomy-path /var/instructlabbigdisk/instructlab/.local/share/instructlab/taxonomy/ --base-branch HEAD --branch HEAD
INFO 2024-07-27 16:30:41,640 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO 2024-07-27 16:30:41,641 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
INFO 2024-07-27 16:30:41,641 numexpr.utils:161: NumExpr defaulting to 16 threads.
Generating
[root@tyler-rhel-newimage instructlab]# /root/ilab model evaluate --benchmark mt_bench --model /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_1056/ --judge-model /var/instructlabbigdisk/instructlab/models/prometheus-eval/prometheus-8x7b-v2.0/ --base-model /var/instructlabbigdisk/instructlab/models/ibm-granite/granite-7b-base/ --output-dir /var/instructlabbigdisk/instructlab/evaltracker/skillscheckpoints/samples_1056/ --gpus 8 --backend vllm --enable-serving-output
INFO 2024-07-27 18:36:02,004 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO 2024-07-27 18:36:02,004 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
INFO 2024-07-27 18:36:02,005 numexpr.utils:161: NumExpr defaulting to 16 threads.
Generating answers...
WARNING 2024-07-27 18:36:02,158 instructlab.model.evaluate:288: Based on your hardware configuration, when using vL
[root@tyler-rhel-newimage instructlab]# /root/ilab model evaluate --benchmark mmlu --model /var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/ --gpus 8
INFO 2024-07-27 19:39:50,893 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO 2024-07-27 19:39:50,893 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
INFO 2024-07-27 19:39:50,893 numexpr.utils:161: NumExpr defaulting to 16 threads.
INFO 2024-07-27 19:39:51,260 datasets:58: PyTorch version 2.3.1 available.
INFO 2024-07-27 19:39:58,693 lm-eval:152: Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
INFO 2024-07-27 19:39:58,693 lm-eval:189: Initializing hf model, with arguments: {'pretrained': '/var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/', 'dtype': 'bfloat16'}
INFO 2024-07-27 19:39:58,802 lm-eval:170:
[root@tyler-rhel-newimage instructlab]# /root/ilab model train --data-path /var/instructlabbigdisk/instructlab/generateddata/messages_combined.jsonl --model-path /var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/ --device cuda --max-batch-len 2 --effective-batch-size 16 --save-samples 185 --num-epochs 10 --ckpt-output-dir /var/instructlabbigdisk/instructlab/skillscheckpoints/ --gpus 8
[2024-07-27 20:03:08,445] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected
[root@tyler-rhel-newimage root]# /root/ilab model serve --model-family mixtral --model-path /var/instructlabbigdisk/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1/ --backend vllm -- --tensor-parallel-size 8 --host 127.0.0.1 --port 8084
INFO 2024-07-28 16:53:08,009 instructlab.model.serve:136: Using model '/var/instructlabbigdisk/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1' with -1 gpu-layers and 4096 max context size.
INFO 2024-07-28 16:53:08,009 instructlab.model.serve:140: Serving model '/var/instructlabbigdisk/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1' with vllm
INFO 2024-07-28 16:53:08,010 instructlab.model.backends.vllm:196: vLLM starting up on pid 64 at http://127.0.0.1:8000/v1
INFO 07-28 16:53:13 api_server.py:219] vLLM API server version 0.5.3.post1
INFO 07-28 16:53:13 api_server.py:220] args: Namespace(host='127.0.0.1', port=8084, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lor
[root@dev-rhel-ai-training-client-11 ~]# cat /var/mnt/inststg1/instructlab/job/checkpoints/skills/full_logs_global0.log
W0814 01:44:19.387000 139736190685632 torch/distributed/run.py:757]
W0814 01:44:19.387000 139736190685632 torch/distributed/run.py:757] *****************************************
W0814 01:44:19.387000 139736190685632 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0814 01:44:19.387000 139736190685632 torch/distributed/run.py:757] *****************************************
[2024-08-14 01:44:22,434] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-14 01:44:22,675] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-14 01:44:22,722] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerato
This file has been truncated, but you can view the full file.
[root@tyler-a100 instructlab]# /root/bin/ilab-sdg.sh data generate --pipeline /var/mnt/inststg1/instructlab/sdg-config/pipelines/agentic/ --taxonomy-path /var/mnt/inststg1/instructlab/taxonomy/ --taxonomy-base empty --endpoint-url http://127.0.0.1:8080/v1 --model-family mixtral --sdg-scale-factor 30 --model /var/mnt/inststg1/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1 --output-dir /var/mnt/inststg1/instructlab/generated/ --tls-insecure
INFO 2024-08-17 15:41:56,393 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO 2024-08-17 15:41:56,393 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
INFO 2024-08-17 15:41:56,393 numexpr.utils:161: NumExpr defaulting to 16 threads.
INFO 2024-08-17 15:41:57,112 datasets:58: PyTorch version 2.3.1 available.
Generating synthetic data using '/var/mnt/inststg1/instructlab/sdg-config/pipelines/agentic/' pipelin
@relyt0925
relyt0925 / gist:f5fb8df46c3b91203bcc0064fa594c8c
Created August 18, 2024 01:22
node_datasets_2024-08-17T15_42_00/knowledge_compliance_personally-identifiable-information_task.yaml
[root@tyler-a100 generated]# cat node_datasets_2024-08-17T15_42_00/knowledge_compliance_personally-identifiable-information_task.yaml
dataset_kwargs:
data_files:
test: /var/mnt/inststg1/instructlab/generated//node_datasets_2024-08-17T15_42_00/mmlubench_knowledge_compliance_personally-identifiable-information.jsonl
dataset_name: null
dataset_path: json
doc_to_choice: '{{[choices[0], choices[1], choices[2], choices[3]]}}'
doc_to_target: '{{answer}}'
doc_to_text: '{{question.strip()}}
@relyt0925
relyt0925 / gist:1fd2ca8c1c9fc21c2108129fb6048d82
Created August 18, 2024 01:24
mmlubench_knowledge_compliance_personally-identifiable-information.jsonl
[root@tyler-a100 generated]# cat /var/mnt/inststg1/instructlab/generated//node_datasets_2024-08-17T15_42_00/mmlubench_knowledge_compliance_personally-identifiable-information.jsonl
{"icl_document":"hii","document":"# Personal Data\n\n## Overview\n\nPersonal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person.\n\nThe abbreviation PII is widely accepted in the United States, but the phrase it abbreviates has four common variants based on personal or personally, and identifiable or identifying. Not all are equivalent, and for legal purposes the effective definitions vary depending on the jurisdiction and the purposes for which the term is being used. Under European Union and United Kingdom data protection regimes, which centre primarily on the General Data Protection Regulation (GDPR), the term \"personal data\" is significantly broader, and determines the scope of the regulatory regime.\n\nNational Institute of Standards an