relyt0925’s gists

relyt0925 / gist:ed5c0601e419028f6eedc12b4fb5fd28

Created July 27, 2024 04:44

newtraining output

	[root@tyler-rhel-newimage instructlab]# /root/ilab model train --data-path /var/instructlabbigdisk/instructlab/generateddata/messages_Mixtral-8x7B-Instruct-v0_2024-07-27T04_27_23.jsonl --model-path /var/instructlabbigdisk/instructlab/models/ibm-granite/granite-7b-base/ --ckpt-output-dir /var/instructlabbigdisk/instructlab/knowledgecheckpoints/ --device cuda --gpus 8 --max-batch-len 1 --effective-batch-size 8 --save-samples 46
	[2024-07-27 04:38:32,852] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
	[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
	[WARNING] async_io: please install the libaio-devel package with yum
	[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
	[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
	[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 b

relyt0925 / gist:5182f8e3065d45d3a02249ed485d9e6a

Created July 27, 2024 17:54

new mtbench_branch output

	[root@tyler-rhel-newimage instructlab]# /root/ilab model evaluate --benchmark mt_bench_branch --model /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_1056/ --judge-model /var/instructlabbigdisk/instructlab/models/prometheus-eval/prometheus-8x7b-v2.0/ --base-model /var/instructlabbigdisk/instructlab/models/ibm-granite/granite-7b-base/ --output-dir /var/instructlabbigdisk/instructlab/evaltracker/skillscheckpoints/samples_1056/ --gpus 8 --backend vllm --enable-serving-output --taxonomy-path /var/instructlabbigdisk/instructlab/.local/share/instructlab/taxonomy/ --base-branch HEAD --branch HEAD
	INFO 2024-07-27 16:30:41,640 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
	INFO 2024-07-27 16:30:41,641 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
	INFO 2024-07-27 16:30:41,641 numexpr.utils:161: NumExpr defaulting to 16 threads.
	Generating

relyt0925 / gist:b7ce2a25adf83d3887c7fd81e9ac9736

Created July 27, 2024 18:45

mt_bench eval

	[root@tyler-rhel-newimage instructlab]# /root/ilab model evaluate --benchmark mt_bench --model /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_1056/ --judge-model /var/instructlabbigdisk/instructlab/models/prometheus-eval/prometheus-8x7b-v2.0/ --base-model /var/instructlabbigdisk/instructlab/models/ibm-granite/granite-7b-base/ --output-dir /var/instructlabbigdisk/instructlab/evaltracker/skillscheckpoints/samples_1056/ --gpus 8 --backend vllm --enable-serving-output
	INFO 2024-07-27 18:36:02,004 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
	INFO 2024-07-27 18:36:02,004 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
	INFO 2024-07-27 18:36:02,005 numexpr.utils:161: NumExpr defaulting to 16 threads.
	Generating answers...
	WARNING 2024-07-27 18:36:02,158 instructlab.model.evaluate:288: Based on your hardware configuration, when using vL

relyt0925 / gist:47745a2ea571e737754dfdc06ddcdf5f

Created July 27, 2024 19:57

new mmlu log

	[root@tyler-rhel-newimage instructlab]# /root/ilab model evaluate --benchmark mmlu --model /var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/ --gpus 8
	INFO 2024-07-27 19:39:50,893 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
	INFO 2024-07-27 19:39:50,893 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
	INFO 2024-07-27 19:39:50,893 numexpr.utils:161: NumExpr defaulting to 16 threads.
	INFO 2024-07-27 19:39:51,260 datasets:58: PyTorch version 2.3.1 available.
	INFO 2024-07-27 19:39:58,693 lm-eval:152: Setting random seed to 0 \| Setting numpy seed to 1234 \| Setting torch manual seed to 1234
	INFO 2024-07-27 19:39:58,693 lm-eval:189: Initializing hf model, with arguments: {'pretrained': '/var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/', 'dtype': 'bfloat16'}
	INFO 2024-07-27 19:39:58,802 lm-eval:170:

relyt0925 / gist:5c6c09acf77c53a563e3663bd2e24fbb

Created July 27, 2024 20:09

new skills training log

	[root@tyler-rhel-newimage instructlab]# /root/ilab model train --data-path /var/instructlabbigdisk/instructlab/generateddata/messages_combined.jsonl --model-path /var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/ --device cuda --max-batch-len 2 --effective-batch-size 16 --save-samples 185 --num-epochs 10 --ckpt-output-dir /var/instructlabbigdisk/instructlab/skillscheckpoints/ --gpus 8
	[2024-07-27 20:03:08,445] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
	[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
	[WARNING] async_io: please install the libaio-devel package with yum
	[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
	[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
	[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected

relyt0925 / gist:7ad70ec81b1e0ea8aaf9a41153c3d555

Created July 28, 2024 16:54

new model serve logs

	[root@tyler-rhel-newimage root]# /root/ilab model serve --model-family mixtral --model-path /var/instructlabbigdisk/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1/ --backend vllm -- --tensor-parallel-size 8 --host 127.0.0.1 --port 8084
	INFO 2024-07-28 16:53:08,009 instructlab.model.serve:136: Using model '/var/instructlabbigdisk/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1' with -1 gpu-layers and 4096 max context size.
	INFO 2024-07-28 16:53:08,009 instructlab.model.serve:140: Serving model '/var/instructlabbigdisk/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1' with vllm
	INFO 2024-07-28 16:53:08,010 instructlab.model.backends.vllm:196: vLLM starting up on pid 64 at http://127.0.0.1:8000/v1
	INFO 07-28 16:53:13 api_server.py:219] vLLM API server version 0.5.3.post1
	INFO 07-28 16:53:13 api_server.py:220] args: Namespace(host='127.0.0.1', port=8084, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lor

relyt0925 / gist:8933fa4862dc711d3a6691013a3225cc

Created August 14, 2024 02:08

oom fail logs

	[root@dev-rhel-ai-training-client-11 ~]# cat /var/mnt/inststg1/instructlab/job/checkpoints/skills/full_logs_global0.log
	W0814 01:44:19.387000 139736190685632 torch/distributed/run.py:757]
	W0814 01:44:19.387000 139736190685632 torch/distributed/run.py:757] *****************************************
	W0814 01:44:19.387000 139736190685632 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
	W0814 01:44:19.387000 139736190685632 torch/distributed/run.py:757] *****************************************
	[2024-08-14 01:44:22,434] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
	[2024-08-14 01:44:22,675] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
	[2024-08-14 01:44:22,722] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerato

relyt0925 / gist:0c4ec82147cd37b82177df7235e12a99

Last active August 17, 2024 19:51

aaaa

This file has been truncated, but you can view the full file.

	[root@tyler-a100 instructlab]# /root/bin/ilab-sdg.sh data generate --pipeline /var/mnt/inststg1/instructlab/sdg-config/pipelines/agentic/ --taxonomy-path /var/mnt/inststg1/instructlab/taxonomy/ --taxonomy-base empty --endpoint-url http://127.0.0.1:8080/v1 --model-family mixtral --sdg-scale-factor 30 --model /var/mnt/inststg1/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1 --output-dir /var/mnt/inststg1/instructlab/generated/ --tls-insecure
	INFO 2024-08-17 15:41:56,393 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
	INFO 2024-08-17 15:41:56,393 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
	INFO 2024-08-17 15:41:56,393 numexpr.utils:161: NumExpr defaulting to 16 threads.
	INFO 2024-08-17 15:41:57,112 datasets:58: PyTorch version 2.3.1 available.
	Generating synthetic data using '/var/mnt/inststg1/instructlab/sdg-config/pipelines/agentic/' pipelin

relyt0925 / gist:f5fb8df46c3b91203bcc0064fa594c8c

Created August 18, 2024 01:22

node_datasets_2024-08-17T15_42_00/knowledge_compliance_personally-identifiable-information_task.yaml

	[root@tyler-a100 generated]# cat node_datasets_2024-08-17T15_42_00/knowledge_compliance_personally-identifiable-information_task.yaml
	dataset_kwargs:
	data_files:
	test: /var/mnt/inststg1/instructlab/generated//node_datasets_2024-08-17T15_42_00/mmlubench_knowledge_compliance_personally-identifiable-information.jsonl
	dataset_name: null
	dataset_path: json
	doc_to_choice: '{{[choices[0], choices[1], choices[2], choices[3]]}}'
	doc_to_target: '{{answer}}'
	doc_to_text: '{{question.strip()}}

relyt0925 / gist:1fd2ca8c1c9fc21c2108129fb6048d82

Created August 18, 2024 01:24

mmlubench_knowledge_compliance_personally-identifiable-information.jsonl

[root@tyler-a100 generated]# cat  /var/mnt/inststg1/instructlab/generated//node_datasets_2024-08-17T15_42_00/mmlubench_knowledge_compliance_personally-identifiable-information.jsonl

{"icl_document":"hii","document":"# Personal Data\n\n## Overview\n\nPersonal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person.\n\nThe abbreviation PII is widely accepted in the United States, but the phrase it abbreviates has four common variants based on personal or personally, and identifiable or identifying. Not all are equivalent, and for legal purposes the effective definitions vary depending on the jurisdiction and the purposes for which the term is being used. Under European Union and United Kingdom data protection regimes, which centre primarily on the General Data Protection Regulation (GDPR), the term \"personal data\" is significantly broader, and determines the scope of the regulatory regime.\n\nNational Institute of Standards an

Tyler Lisowski relyt0925