Created
July 27, 2024 20:09
-
-
Save relyt0925/5c6c09acf77c53a563e3663bd2e24fbb to your computer and use it in GitHub Desktop.
new skills training log
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[root@tyler-rhel-newimage instructlab]# /root/ilab model train --data-path /var/instructlabbigdisk/instructlab/generateddata/messages_combined.jsonl --model-path /var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/ --device cuda --max-batch-len 2 --effective-batch-size 16 --save-samples 185 --num-epochs 10 --ckpt-output-dir /var/instructlabbigdisk/instructlab/skillscheckpoints/ --gpus 8 | |
[2024-07-27 20:03:08,445] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
INFO 2024-07-27 20:03:11,898 numexpr.utils:145: Note: detected 80 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. | |
INFO 2024-07-27 20:03:11,898 numexpr.utils:148: Note: NumExpr detected 80 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16. | |
INFO 2024-07-27 20:03:11,898 numexpr.utils:161: NumExpr defaulting to 16 threads. | |
INFO 2024-07-27 20:03:12,324 datasets:58: PyTorch version 2.3.1 available. | |
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
INFO 2024-07-27 20:03:12,773 root:611: eos: 32001, pad: 32002, system: 32003, user: 32004, assistant: 32005 | |
tokenizing the dataset with /var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/ tokenizer... | |
ten largest length percentiles: | |
quantile 90th: 90.0 | |
quantile 91th: 92.44 | |
quantile 92th: 93.0 | |
quantile 93th: 94.0 | |
quantile 94th: 96.87999999999994 | |
quantile 95th: 99.59999999999997 | |
quantile 96th: 102.91999999999996 | |
quantile 97th: 107.0 | |
quantile 98th: 109.59999999999997 | |
quantile 99th: 115.27999999999997 | |
quantile 100th: 141.0 | |
at 4096 max sequence length, the number of samples to be dropped is 0 | |
(0.00% of total) | |
quantile 0th: 43.0 | |
quantile 1th: 44.0 | |
quantile 2th: 44.68 | |
quantile 3th: 45.0 | |
quantile 4th: 45.36 | |
quantile 5th: 46.0 | |
quantile 6th: 48.0 | |
quantile 7th: 48.0 | |
quantile 8th: 49.0 | |
quantile 9th: 49.56 | |
quantile 10th: 50.0 | |
at 20 min sequence length, the number of samples to be dropped is 0 | |
checking the validity of the samples... | |
INFO 2024-07-27 20:03:13,126 root:611: number of dropped samples: 0 -- out of 185 | |
Categorizing training data type... | |
Data type sorting: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 185/185 [00:00<00:00, 648242.47it/s] | |
unmasking the appropriate message content... | |
The following are some examples of the processed data, with masked tokens (not to be learned) represented with <mask>. The unmasked tokens are the ones the model will learn to predict. Please review these samples to ensure the model is learning to predict expected tokens. | |
Instruction ex sample 17: <mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask> | |
Answer: Based on the provided text, there are 8 villages named Qarah Tappeh in different districts and provinces of Iran according to the 2006 census.<|endoftext|> | |
Original Input: <|user|> | |
Question: How many villages named Qarah Tappeh were there in different districts and provinces of Iran according to the 2006 census? | |
<|assistant|> | |
Answer: Based on the provided text, there are 8 villages named Qarah Tappeh in different districts and provinces of Iran according to the 2006 census.<|endoftext|> | |
Instruction ex sample 99: <mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask><mask> | |
Answer: There are three items in this list that are types of fruits: "apple," "banana," and "orange."<|endoftext|> | |
Original Input: <|user|> | |
Question: How many items in this list are types of fruits and what are they? | |
<|assistant|> | |
Answer: There are three items in this list that are types of fruits: "apple," "banana," and "orange."<|endoftext|> | |
Creating json from Arrow format: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 59.89ba/s] | |
Running command: torchrun --nnodes=1 --node_rank=0 --nproc_per_node=8 --rdzv_id=123 --rdzv_endpoint=127.0.0.1:12222 /opt/python3.11/venv/lib64/python3.11/site-packages/instructlab/training/main_ds.py --model_name_or_path=/var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/ --data_path=/var/instructlabbigdisk/instructlab/.local/share/instructlab/internal/data.jsonl --output_dir=/var/instructlabbigdisk/instructlab/skillscheckpoints/ --num_epochs=10 --effective_batch_size=16 --learning_rate=2e-05 --num_warmup_steps=25 --save_samples=185 --log_level=INFO --max_batch_len=2 --seed=42 --chat-tmpl-path=/opt/python3.11/venv/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py | |
W0727 20:03:14.580000 140296119280064 torch/distributed/run.py:757] | |
W0727 20:03:14.580000 140296119280064 torch/distributed/run.py:757] ***************************************** | |
W0727 20:03:14.580000 140296119280064 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
W0727 20:03:14.580000 140296119280064 torch/distributed/run.py:757] ***************************************** | |
[2024-07-27 20:03:17,567] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-07-27 20:03:17,805] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-07-27 20:03:17,843] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-07-27 20:03:17,879] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-07-27 20:03:17,908] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-07-27 20:03:17,949] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-07-27 20:03:17,978] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2024-07-27 20:03:17,981] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum [WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. | |
[WARNING] async_io: please install the libaio-devel package with yum | |
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. | |
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 | |
[WARNING] using untested triton version (2.3.1), only 1.0.0 is known to be compatible | |
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
[2024-07-27 20:03:21,555] [INFO] [comm.py:637:init_distributed] cdb=None | |
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
model_name_or_path: /var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/ | |
data_path: /var/instructlabbigdisk/instructlab/.local/share/instructlab/internal/data.jsonl | |
output_dir: /var/instructlabbigdisk/instructlab/skillscheckpoints/ | |
num_epochs: 10 | |
last_step: 0 | |
effective_batch_size: 16 | |
learning_rate: 2.0e-05 | |
lr_scheduler: cosine | |
num_warmup_steps: 25 | |
save_samples: 185 | |
save_samples_ds: null | |
save_last: false | |
log_level: INFO | |
seed: 42 | |
mock_data: false | |
mock_len: 2600 | |
sharding_strategy: FULL_SHARD | |
is_granite: false | |
lora_r: 0 | |
lora_alpha: 32 | |
lora_dropout: 0.1 | |
lora_quant_bits: null | |
lora_target_modules: null | |
max_batch_len: 2 | |
cpu_offload_optimizer: false | |
cpu_offload_optimizer_pin_memory: false | |
cpu_offload_optimizer_ratio: 1.0 | |
NEFTune_alpha: null | |
chat_tmpl_path: /opt/python3.11/venv/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py | |
disable_flash_attn: false | |
{ | |
"script_params": { | |
"model_name_or_path": "/var/instructlabbigdisk/instructlab/knowledgecheckpoints/hf_format/samples_1024/", | |
"data_path": "/var/instructlabbigdisk/instructlab/.local/share/instructlab/internal/data.jsonl", | |
"output_dir": "/var/instructlabbigdisk/instructlab/skillscheckpoints/", | |
"num_epochs": 10, | |
"last_step": 0, | |
"effective_batch_size": 16, | |
"learning_rate": 2e-05, | |
"lr_scheduler": "cosine", | |
"num_warmup_steps": 25, | |
"save_samples": 185, | |
"save_samples_ds": null, | |
"save_last": false, | |
"log_level": "INFO", | |
"seed": 42, | |
"mock_data": false, | |
"mock_len": 2600, | |
"sharding_strategy": "FULL_SHARD", | |
"is_granite": false, | |
"lora_r": 0, | |
"lora_alpha": 32, | |
"lora_dropout": 0.1, | |
"lora_quant_bits": null, | |
"lora_target_modules": null, | |
"max_batch_len": 2, | |
"cpu_offload_optimizer": false, | |
"cpu_offload_optimizer_pin_memory": false, | |
"cpu_offload_optimizer_ratio": 1.0, | |
"NEFTune_alpha": null, | |
"chat_tmpl_path": "/opt/python3.11/venv/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py", | |
"disable_flash_attn": false | |
}, | |
"timestamp": "2024-07-27T20:03:21.897629" | |
} | |
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
[2024-07-27 20:03:21,973] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-07-27 20:03:21,973] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl | |
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
[2024-07-27 20:03:22,374] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-07-27 20:03:22,515] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-07-27 20:03:22,529] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-07-27 20:03:22,538] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-07-27 20:03:22,664] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2024-07-27 20:03:22,682] [INFO] [comm.py:637:init_distributed] cdb=None | |
tyler-rhel-newimage:260:260 [0] NCCL INFO Bootstrap : Using enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:260:260 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation | |
tyler-rhel-newimage:260:260 [0] NCCL INFO cudaDriverVersion 12040 | |
NCCL version 2.20.5+cuda12.4 | |
tyler-rhel-newimage:265:265 [5] NCCL INFO cudaDriverVersion 12040 | |
tyler-rhel-newimage:265:265 [5] NCCL INFO Bootstrap : Using enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:265:265 [5] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation | |
tyler-rhel-newimage:267:267 [7] NCCL INFO cudaDriverVersion 12040 | |
tyler-rhel-newimage:267:267 [7] NCCL INFO Bootstrap : Using enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:267:267 [7] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation | |
tyler-rhel-newimage:262:262 [2] NCCL INFO cudaDriverVersion 12040 | |
tyler-rhel-newimage:262:262 [2] NCCL INFO Bootstrap : Using enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:263:263 [3] NCCL INFO cudaDriverVersion 12040 | |
tyler-rhel-newimage:262:262 [2] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation | |
tyler-rhel-newimage:263:263 [3] NCCL INFO Bootstrap : Using enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:263:263 [3] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation | |
tyler-rhel-newimage:266:266 [6] NCCL INFO cudaDriverVersion 12040 | |
tyler-rhel-newimage:266:266 [6] NCCL INFO Bootstrap : Using enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:266:266 [6] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation | |
tyler-rhel-newimage:261:261 [1] NCCL INFO cudaDriverVersion 12040 | |
tyler-rhel-newimage:261:261 [1] NCCL INFO Bootstrap : Using enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:261:261 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation | |
tyler-rhel-newimage:264:264 [4] NCCL INFO cudaDriverVersion 12040 | |
tyler-rhel-newimage:264:264 [4] NCCL INFO Bootstrap : Using enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:264:264 [4] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO NET/IB : No device found. | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Using network Socket | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO NET/IB : No device found. | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO NET/IB : No device found. | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO NET/IB : No device found. | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Using network Socket | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Using network Socket | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Using network Socket | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO NET/IB : No device found. | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Using network Socket | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO NET/IB : No device found. | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Using network Socket | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO NET/IB : No device found. | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Using network Socket | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO NET/IB : No device found. | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO NET/Socket : Using [0]enp8s0:192.168.48.11<0> | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Using network Socket | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO comm 0x55f359e7d980 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 commId 0xbba7bcd413cc6af1 - Init START | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO comm 0x55fffff3ce80 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 commId 0xbba7bcd413cc6af1 - Init START | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO comm 0x55fca60525d0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 commId 0xbba7bcd413cc6af1 - Init START | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO comm 0x55f25f665d50 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 commId 0xbba7bcd413cc6af1 - Init START | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO comm 0x564fb40d9fa0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 commId 0xbba7bcd413cc6af1 - Init START | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO comm 0x55b22a5ae220 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 commId 0xbba7bcd413cc6af1 - Init START | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO comm 0x558210938950 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 commId 0xbba7bcd413cc6af1 - Init START | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO comm 0x56464a4e7a70 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 commId 0xbba7bcd413cc6af1 - Init START | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ffffff00,00000000 | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO NVLS multicast support is not available on dev 6 | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ffffff00,00000000 | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO NVLS multicast support is not available on dev 5 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffffffff | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ffffff00,00000000 | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO NVLS multicast support is not available on dev 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO NVLS multicast support is not available on dev 0 | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffffffff | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO NVLS multicast support is not available on dev 3 | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffffffff | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO NVLS multicast support is not available on dev 2 | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffffffff | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO NVLS multicast support is not available on dev 1 | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ffffff00,00000000 | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO NVLS multicast support is not available on dev 4 | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO comm 0x564fb40d9fa0 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO comm 0x558210938950 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO comm 0x55f359e7d980 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO comm 0x55b22a5ae220 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO comm 0x56464a4e7a70 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO comm 0x55fffff3ce80 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO comm 0x55f25f665d50 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO comm 0x55fca60525d0 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Connected all rings | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Connected all rings | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Connected all rings | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Connected all rings | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Connected all rings | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Connected all rings | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Connected all rings | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Connected all rings | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO Connected all trees | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO Connected all trees | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO Connected all trees | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO Connected all trees | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO Connected all trees | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO Connected all trees | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO Connected all trees | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO Connected all trees | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO NCCL_WORK_FIFO_DEPTH set by environment to 4194304. | |
tyler-rhel-newimage:265:1026 [5] NCCL INFO comm 0x56464a4e7a70 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 commId 0xbba7bcd413cc6af1 - Init COMPLETE | |
tyler-rhel-newimage:266:1024 [6] NCCL INFO comm 0x55f359e7d980 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 commId 0xbba7bcd413cc6af1 - Init COMPLETE | |
tyler-rhel-newimage:267:1029 [7] NCCL INFO comm 0x564fb40d9fa0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 commId 0xbba7bcd413cc6af1 - Init COMPLETE | |
tyler-rhel-newimage:260:1022 [0] NCCL INFO comm 0x558210938950 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 commId 0xbba7bcd413cc6af1 - Init COMPLETE | |
tyler-rhel-newimage:261:1027 [1] NCCL INFO comm 0x55fca60525d0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 commId 0xbba7bcd413cc6af1 - Init COMPLETE | |
tyler-rhel-newimage:264:1028 [4] NCCL INFO comm 0x55b22a5ae220 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 commId 0xbba7bcd413cc6af1 - Init COMPLETE | |
tyler-rhel-newimage:262:1023 [2] NCCL INFO comm 0x55f25f665d50 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 commId 0xbba7bcd413cc6af1 - Init COMPLETE | |
tyler-rhel-newimage:263:1025 [3] NCCL INFO comm 0x55fffff3ce80 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 commId 0xbba7bcd413cc6af1 - Init COMPLETE | |
Generating train split: 185 examples [00:00, 25776.38 examples/s] | |
Data length calculation: 100%|██████████| 185/185 [00:00<00:00, 12894.40it/s] | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Data length calculation: 100%|██████████| 185/185 [00:00<00:00, 11066.61it/s] | |
Effective batch size is too low for multipack sampling, max sample length=141 and min packing length=135. Switching to naive distributed sampling. | |
{ | |
"num_gpus": 8, | |
"avg_sample_len": 67.78918918918919, | |
"effective_batch_size": 16, | |
"max_batch_len_per_gpu": 2, | |
"packing_max_batch_len": null, | |
"grad_accum": 1, | |
"num_batches": 12, | |
"avg_samples_per_batch": 15.416666666666666, | |
"samples_per_gpu": 2, | |
"timestamp": "2024-07-27T20:03:33.017444" | |
} | |
Data length calculation: 100%|██████████| 185/185 [00:00<00:00, 11659.95it/s] | |
Data length calculation: 100%|██████████| 185/185 [00:00<00:00, 12065.72it/s] | |
Data length calculation: 100%|██████████| 185/185 [00:00<00:00, 11400.91it/s] | |
Data length calculation: 100%|██████████| 185/185 [00:00<00:00, 11540.81it/s] | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Data length calculation: 0%| | 0/185 [00:00<?, ?it/s]You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Data length calculation: 100%|██████████| 185/185 [00:00<00:00, 12968.10it/s] | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Data length calculation: 100%|██████████| 185/185 [00:00<00:00, 11126.75it/s] | |
Using /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... | |
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121/fused_adam/build.ninja... | |
/opt/python3.11/venv/lib64/python3.11/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. | |
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. | |
warnings.warn( | |
Building extension module fused_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
ninja: no work to do. | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.15251493453979492 seconds | |
Using /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... | |
Using /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121/fused_adam/build.ninja... | |
/opt/python3.11/venv/lib64/python3.11/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. | |
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. | |
warnings.warn( | |
Building extension module fused_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
ninja: no work to do. | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.1219935417175293 seconds | |
[2024-07-27 20:03:39,014] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.4+d254d75, git-hash=d254d75, git-branch=HEAD | |
[2024-07-27 20:03:39,014] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.20261573791503906 seconds | |
Using /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121/fused_adam/build.ninja... | |
/opt/python3.11/venv/lib64/python3.11/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. | |
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. | |
warnings.warn( | |
Building extension module fused_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
ninja: no work to do. | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.12093877792358398 seconds | |
Using /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... | |
Using /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... | |
Using /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121/fused_adam/build.ninja... | |
/opt/python3.11/venv/lib64/python3.11/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. | |
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. | |
warnings.warn( | |
Building extension module fused_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
Using /var/instructlabbigdisk/instructlab/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... | |
ninja: no work to do. | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.12085723876953125 seconds | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.10260534286499023 seconds | |
Loading extension module fused_adam... | |
Loading extension module fused_adam... | |
Time to load fused_adam op: 0.2024221420288086 seconds | |
Time to load fused_adam op: 0.20228958129882812 seconds | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Using network Socket | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Using network Socket | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Using network Socket | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Using network Socket | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Using network Socket | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Using network Socket | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Using network Socket | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Using non-device net plugin version 0 | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Using network Socket | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO bootstrapSplit: comm 0x56464ab274c0 parent 0x56464a4e7a70 rank 5 nranks 8 color -934961569 key 5 prev 4 next 6 - DONE | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO comm 0x56464ab274c0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 commId 0x358cd0e27660cbba - Init START | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO bootstrapSplit: comm 0x55b22abe29d0 parent 0x55b22a5ae220 rank 4 nranks 8 color -934961569 key 4 prev 3 next 5 - DONE | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO comm 0x55b22abe29d0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 commId 0x358cd0e27660cbba - Init START | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO bootstrapSplit: comm 0x560000574c60 parent 0x55fffff3ce80 rank 3 nranks 8 color -934961569 key 3 prev 2 next 4 - DONE | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO comm 0x560000574c60 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 commId 0x358cd0e27660cbba - Init START | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO bootstrapSplit: comm 0x55f25fc9eb30 parent 0x55f25f665d50 rank 2 nranks 8 color -934961569 key 2 prev 1 next 3 - DONE | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO comm 0x55f25fc9eb30 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 commId 0x358cd0e27660cbba - Init START | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO bootstrapSplit: comm 0x55fca66893d0 parent 0x55fca60525d0 rank 1 nranks 8 color -934961569 key 1 prev 0 next 2 - DONE | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO comm 0x55fca66893d0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 commId 0x358cd0e27660cbba - Init START | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO bootstrapSplit: comm 0x558210f77210 parent 0x558210938950 rank 0 nranks 8 color -934961569 key 0 prev 7 next 1 - DONE | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO comm 0x558210f77210 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 commId 0x358cd0e27660cbba - Init START | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO bootstrapSplit: comm 0x564fb4716ac0 parent 0x564fb40d9fa0 rank 7 nranks 8 color -934961569 key 7 prev 6 next 0 - DONE | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO comm 0x564fb4716ac0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 commId 0x358cd0e27660cbba - Init START | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO bootstrapSplit: comm 0x55f35a3e2520 parent 0x55f359e7d980 rank 6 nranks 8 color -934961569 key 6 prev 5 next 7 - DONE | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO comm 0x55f35a3e2520 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 commId 0x358cd0e27660cbba - Init START | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ffffff00,00000000 | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO NVLS multicast support is not available on dev 4 | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ffffff00,00000000 | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO NVLS multicast support is not available on dev 7 | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffffffff | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO NVLS multicast support is not available on dev 1 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffffffff | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO NVLS multicast support is not available on dev 0 | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ffffff00,00000000 | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO NVLS multicast support is not available on dev 6 | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Setting affinity for GPU 3 to ff,ffffffff | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO NVLS multicast support is not available on dev 3 | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ffffff00,00000000 | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO NVLS multicast support is not available on dev 5 | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Setting affinity for GPU 2 to ff,ffffffff | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO NVLS multicast support is not available on dev 2 | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO comm 0x55f25fc9eb30 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO comm 0x558210f77210 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO comm 0x564fb4716ac0 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO comm 0x55f35a3e2520 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO comm 0x560000574c60 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO comm 0x56464ab274c0 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO comm 0x55fca66893d0 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO comm 0x55b22abe29d0 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO P2P Chunksize set to 524288 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Connected all rings | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Connected all rings | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Connected all rings | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Connected all rings | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Connected all rings | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Connected all rings | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Connected all rings | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Connected all rings | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO Connected all trees | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO Connected all trees | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO Connected all trees | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO Connected all trees | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO Connected all trees | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO Connected all trees | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO Connected all trees | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO Connected all trees | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer | |
tyler-rhel-newimage:265:1132 [5] NCCL INFO comm 0x56464ab274c0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId c060 commId 0x358cd0e27660cbba - Init COMPLETE | |
tyler-rhel-newimage:266:1126 [6] NCCL INFO comm 0x55f35a3e2520 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId e070 commId 0x358cd0e27660cbba - Init COMPLETE | |
tyler-rhel-newimage:267:1125 [7] NCCL INFO comm 0x564fb4716ac0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId e080 commId 0x358cd0e27660cbba - Init COMPLETE | |
tyler-rhel-newimage:260:1124 [0] NCCL INFO comm 0x558210f77210 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 8010 commId 0x358cd0e27660cbba - Init COMPLETE | |
tyler-rhel-newimage:261:1135 [1] NCCL INFO comm 0x55fca66893d0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 8020 commId 0x358cd0e27660cbba - Init COMPLETE | |
tyler-rhel-newimage:262:1141 [2] NCCL INFO comm 0x55f25fc9eb30 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a030 commId 0x358cd0e27660cbba - Init COMPLETE | |
tyler-rhel-newimage:264:1129 [4] NCCL INFO comm 0x55b22abe29d0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId c050 commId 0x358cd0e27660cbba - Init COMPLETE | |
tyler-rhel-newimage:263:1138 [3] NCCL INFO comm 0x560000574c60 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId a040 commId 0x358cd0e27660cbba - Init COMPLETE | |
[2024-07-27 20:03:47,872] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False | |
[2024-07-27 20:03:47,874] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer | |
[2024-07-27 20:03:47,874] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer | |
[2024-07-27 20:03:47,886] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam | |
[2024-07-27 20:03:47,887] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'> | |
[2024-07-27 20:03:47,887] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer | |
[2024-07-27 20:03:47,887] [INFO] [stage_1_and_2.py:148:__init__] Reduce bucket size 500,000,000 | |
[2024-07-27 20:03:47,887] [INFO] [stage_1_and_2.py:149:__init__] Allgather bucket size 500,000,000 | |
[2024-07-27 20:03:47,887] [INFO] [stage_1_and_2.py:150:__init__] CPU Offload: False | |
[2024-07-27 20:03:47,887] [INFO] [stage_1_and_2.py:151:__init__] Round robin gradient partitioning: False | |
[2024-07-27 20:04:00,524] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/instructlabbigdisk/instructlab/skillscheckpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-07-27 20:04:01,385] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states | |
[2024-07-27 20:04:01,386] [INFO] [utils.py:782:see_memory_usage] MA 15.69 GB Max_MA 17.26 GB CA 17.26 GB Max_CA 17 GB | |
[2024-07-27 20:04:01,386] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 138.49 GB, percent = 11.0% | |
[2024-07-27 20:04:01,578] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states | |
[2024-07-27 20:04:01,579] [INFO] [utils.py:782:see_memory_usage] MA 15.69 GB Max_MA 18.83 GB CA 20.4 GB Max_CA 20 GB | |
[2024-07-27 20:04:01,579] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 138.49 GB, percent = 11.0% | |
[2024-07-27 20:04:01,579] [INFO] [stage_1_and_2.py:543:__init__] optimizer state initialized | |
[2024-07-27 20:04:01,777] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer | |
[2024-07-27 20:04:01,778] [INFO] [utils.py:782:see_memory_usage] MA 15.69 GB Max_MA 15.69 GB CA 20.4 GB Max_CA 20 GB | |
[2024-07-27 20:04:01,778] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 138.49 GB, percent = 11.0% | |
[2024-07-27 20:04:01,780] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer | |
[2024-07-27 20:04:01,780] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler | |
[2024-07-27 20:04:01,780] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7fbf02112310> | |
[2024-07-27 20:04:01,780] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:01,781] [INFO] [config.py:997:print] DeepSpeedEngine configuration: | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] activation_checkpointing_config { | |
"partition_activations": false, | |
"contiguous_memory_optimization": false, | |
"cpu_checkpointing": false, | |
"number_checkpoints": null, | |
"synchronize_checkpoint_boundary": false, | |
"profile": false | |
} | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] amp_enabled .................. False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] amp_params ................... False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] autotuning_config ............ { | |
"enabled": false, | |
"start_step": null, | |
"end_step": null, | |
"metric_path": null, | |
"arg_mappings": null, | |
"metric": "throughput", | |
"model_info": null, | |
"results_dir": "autotuning_results", | |
"exps_dir": "autotuning_exps", | |
"overwrite": true, | |
"fast": true, | |
"start_profile_step": 3, | |
"end_profile_step": 5, | |
"tuner_type": "gridsearch", | |
"tuner_early_stopping": 5, | |
"tuner_num_trials": 50, | |
"model_info_path": null, | |
"mp_size": 1, | |
"max_train_batch_size": null, | |
"min_train_batch_size": 1, | |
"max_train_micro_batch_size_per_gpu": 1.024000e+03, | |
"min_train_micro_batch_size_per_gpu": 1, | |
"num_tuning_micro_batch_sizes": 3 | |
} | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] bfloat16_enabled ............. True | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] bfloat16_immediate_grad_update False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] checkpoint_parallel_write_pipeline False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] checkpoint_tag_validation_enabled True | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] checkpoint_tag_validation_fail False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fbef4750bd0> | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] communication_data_type ...... None | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] curriculum_enabled_legacy .... False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] curriculum_params_legacy ..... False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] data_efficiency_enabled ...... False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] dataloader_drop_last ......... False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] disable_allgather ............ False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] dump_state ................... False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] dynamic_loss_scale_args ...... None | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] eigenvalue_enabled ........... False | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] eigenvalue_gas_boundary_resolution 1 | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] eigenvalue_layer_name ........ bert.encoder.layer | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] eigenvalue_layer_num ......... 0 | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] eigenvalue_max_iter .......... 100 | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] eigenvalue_stability ......... 1e-06 | |
[2024-07-27 20:04:01,782] [INFO] [config.py:1001:print] eigenvalue_tol ............... 0.01 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] eigenvalue_verbose ........... False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] elasticity_enabled ........... False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] flops_profiler_config ........ { | |
"enabled": false, | |
"recompute_fwd_factor": 0.0, | |
"profile_step": 1, | |
"module_depth": -1, | |
"top_modules": 1, | |
"detailed": true, | |
"output_file": null | |
} | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] fp16_auto_cast ............... None | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] fp16_enabled ................. False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] fp16_master_weights_and_gradients False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] global_rank .................. 0 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] grad_accum_dtype ............. None | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] gradient_accumulation_steps .. 1 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] gradient_clipping ............ 1.0 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] gradient_predivide_factor .... 1.0 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] graph_harvesting ............. False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] initial_dynamic_scale ........ 1 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] load_universal_checkpoint .... False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] loss_scale ................... 1.0 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] memory_breakdown ............. False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] mics_hierarchial_params_gather False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] mics_shard_size .............. -1 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] nebula_config ................ { | |
"enabled": false, | |
"persistent_storage_path": null, | |
"persistent_time_interval": 100, | |
"num_of_version_in_retention": 2, | |
"enable_nebula_load": true, | |
"load_path": null | |
} | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] optimizer_legacy_fusion ...... False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] optimizer_name ............... None | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] optimizer_params ............. None | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] pld_enabled .................. False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] pld_params ................... False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] prescale_gradients ........... False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] scheduler_name ............... None | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] scheduler_params ............. None | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] seq_parallel_communication_data_type torch.float32 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] sparse_attention ............. None | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] sparse_gradients_enabled ..... False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] steps_per_print .............. 1 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] timers_config ................ enabled=True synchronized=True | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] train_batch_size ............. 16 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] train_micro_batch_size_per_gpu 2 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] use_data_before_expert_parallel_ False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] use_node_local_storage ....... False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] wall_clock_breakdown ......... False | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] weight_quantization_config ... None | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] world_size ................... 8 | |
[2024-07-27 20:04:01,783] [INFO] [config.py:1001:print] zero_allow_untested_optimizer False | |
[2024-07-27 20:04:01,784] [INFO] [config.py:1001:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True | |
[2024-07-27 20:04:01,784] [INFO] [config.py:1001:print] zero_enabled ................. True | |
[2024-07-27 20:04:01,784] [INFO] [config.py:1001:print] zero_force_ds_cpu_optimizer .. True | |
[2024-07-27 20:04:01,784] [INFO] [config.py:1001:print] zero_optimization_stage ...... 2 | |
[2024-07-27 20:04:01,784] [INFO] [config.py:987:print_user_config] json = { | |
"train_batch_size": 16, | |
"gradient_accumulation_steps": 1, | |
"train_micro_batch_size_per_gpu": 2, | |
"steps_per_print": 1, | |
"zero_optimization": { | |
"stage": 2, | |
"offload_param": { | |
"device": "none" | |
}, | |
"offload_optimizer": { | |
"device": "none" | |
} | |
}, | |
"bf16": { | |
"enabled": true | |
}, | |
"gradient_clipping": 1.0, | |
"prescale_gradients": false, | |
"wall_clock_breakdown": false | |
} | |
[2024-07-27 20:04:01,784] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/instructlabbigdisk/instructlab/skillscheckpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
Number of samples per save: 176 | |
[2024-07-27 20:04:01,865] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/instructlabbigdisk/instructlab/skillscheckpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-07-27 20:04:01,875] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/instructlabbigdisk/instructlab/skillscheckpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-07-27 20:04:01,984] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/instructlabbigdisk/instructlab/skillscheckpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-07-27 20:04:02,237] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/instructlabbigdisk/instructlab/skillscheckpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-07-27 20:04:02,285] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/instructlabbigdisk/instructlab/skillscheckpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
[2024-07-27 20:04:02,433] [WARNING] [engine.py:2749:load_checkpoint] Unable to find latest file at /var/instructlabbigdisk/instructlab/skillscheckpoints/ds_native/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. | |
Epoch 0: 0%| | 0/12 [00:00<?, ?it/s] total tokens: 118 num samples: 2 num padding tokens: 13 - rank: 7 max len: 59 min len: 46 avg len: 52.5 num_loss_counted_tokens: 52 | |
total tokens: 138 num samples: 2 num padding tokens: 10 - rank: 7 max len: 69 min len: 59 avg len: 64.0 num_loss_counted_tokens: 68 | |
total tokens: 282 num samples: 2 num padding tokens: 83 - rank: 7 max len: 141 min len: 58 avg len: 99.5 num_loss_counted_tokens: 150 | |
total tokens: 136 num samples: 2 num padding tokens: 5 - rank: 7 max len: 68 min len: 63 avg len: 65.5 num_loss_counted_tokens: 56 | |
total tokens: 116 num samples: 2 num padding tokens: 9 - rank: 7 max len: 58 min len: 49 avg len: 53.5 num_loss_counted_tokens: 53 | |
total tokens: 186 num samples: 2 num padding tokens: 5 - rank: 7 max len: 93 min len: 88 avg len: 90.5 num_loss_counted_tokens: 121 | |
total tokens: 156 num samples: 2 num padding tokens: 21 - rank: 6 max len: 78 min len: 57 avg len: 67.5 num_loss_counted_tokens: 70 total tokens: 110 num samples: 2 num padding tokens: 3 - rank: 3 max len: 55 min len: 52 avg len: 53.5 num_loss_counted_tokens: 61 | |
total tokens: 184 num samples: 2 num padding tokens: 27 - rank: 7 max len: 92 min len: 65 avg len: 78.5 num_loss_counted_tokens: 90 | |
total tokens: 128 num samples: 2 num padding tokens: 6 - rank: 7 max len: 64 min len: 58 avg len: 61.0 num_loss_counted_tokens: 66 | |
total tokens: 142 num samples: 2 num padding tokens: 14 - rank: 7 max len: 71 min len: 57 avg len: 64.0 num_loss_counted_tokens: 58 | |
total tokens: 114 num samples: 2 num padding tokens: 13 - rank: 7 max len: 57 min len: 44 avg len: 50.5 num_loss_counted_tokens: 55 | |
total tokens: 140 num samples: 2 num padding tokens: 3 - rank: 7 max len: 70 min len: 67 avg len: 68.5 num_loss_counted_tokens: 70 | |
total tokens: 142 num samples: 2 num padding tokens: 8 - rank: 1 max len: 71 min len: 63 avg len: 67.0 num_loss_counted_tokens: 74 total tokens: 128 num samples: 2 num padding tokens: 5 - rank: 1 max len: 64 min len: 59 avg len: 61.5 num_loss_counted_tokens: 71 | |
total tokens: 158 num samples: 2 num padding tokens: 36 - rank: 3 max len: 79 min len: 43 avg len: 61.0 num_loss_counted_tokens: 56 | |
total tokens: 202 num samples: 2 num padding tokens: 51 - rank: 4 max len: 101 min len: 50 avg len: 75.5 num_loss_counted_tokens: 106 | |
total tokens: 106 num samples: 2 num padding tokens: 7 - rank: 0 max len: 53 min len: 46 avg len: 49.5 num_loss_counted_tokens: 56 | |
total tokens: 136 num samples: 2 num padding tokens: 7 - rank: 7 max len: 68 min len: 61 avg len: 64.5 num_loss_counted_tokens: 59 | |
total tokens: 126 num samples: 2 num padding tokens: 19 - rank: 0 max len: 63 min len: 44 avg len: 53.5 num_loss_counted_tokens: 53 | |
total tokens: 134 num samples: 2 num padding tokens: 0 - rank: 4 max len: 67 min len: 67 avg len: 67.0 num_loss_counted_tokens: 52 | |
total tokens: 166 num samples: 2 num padding tokens: 21 - rank: 3 max len: 83 min len: 62 avg len: 72.5 num_loss_counted_tokens: 88 | |
total tokens: 188 num samples: 2 num padding tokens: 4 - rank: 6 max len: 94 min len: 90 avg len: 92.0 num_loss_counted_tokens: 121 | |
total tokens: 168 num samples: 2 num padding tokens: 20 - rank: 3 max len: 84 min len: 64 avg len: 74.0 num_loss_counted_tokens: 95 | |
total tokens: 106 num samples: 2 num padding tokens: 2 - rank: 6 max len: 53 min len: 51 avg len: 52.0 num_loss_counted_tokens: 51 | |
total tokens: 110 num samples: 2 num padding tokens: 0 - rank: 6 max len: 55 min len: 55 avg len: 55.0 num_loss_counted_tokens: 64 | |
total tokens: 142 num samples: 2 num padding tokens: 11 - rank: 4 max len: 71 min len: 60 avg len: 65.5 num_loss_counted_tokens: 58 | |
total tokens: 244 num samples: 2 num padding tokens: 63 - rank: 4 max len: 122 min len: 59 avg len: 90.5 num_loss_counted_tokens: 120 | |
total tokens: 128 num samples: 2 num padding tokens: 2 - rank: 0 max len: 64 min len: 62 avg len: 63.0 num_loss_counted_tokens: 65 | |
total tokens: 120 num samples: 2 num padding tokens: 0 - rank: 4 max len: 60 min len: 60 avg len: 60.0 num_loss_counted_tokens: 69 | |
total tokens: 140 num samples: 2 num padding tokens: 12 - rank: 6 max len: 70 min len: 58 avg len: 64.0 num_loss_counted_tokens: 77 | |
total tokens: 180 num samples: 2 num padding tokens: 20 - rank: 4 max len: 90 min len: 70 avg len: 80.0 num_loss_counted_tokens: 111 | |
total tokens: 124 num samples: 2 num padding tokens: 10 - rank: 4 max len: 62 min len: 52 avg len: 57.0 num_loss_counted_tokens: 59 | |
total tokens: 110 num samples: 2 num padding tokens: 10 - rank: 4 max len: 55 min len: 45 avg len: 50.0 num_loss_counted_tokens: 57 | |
total tokens: 144 num samples: 2 num padding tokens: 12 - rank: 4 max len: 72 min len: 60 avg len: 66.0 num_loss_counted_tokens: 68 | |
total tokens: 200 num samples: 2 num padding tokens: 50 - rank: 6 max len: 100 min len: 50 avg len: 75.0 num_loss_counted_tokens: 91 | |
total tokens: 152 num samples: 2 num padding tokens: 15 - rank: 3 max len: 76 min len: 61 avg len: 68.5 num_loss_counted_tokens: 86 | |
total tokens: 154 num samples: 2 num padding tokens: 33 - rank: 4 max len: 77 min len: 44 avg len: 60.5 num_loss_counted_tokens: 77 | |
total tokens: 128 num samples: 2 num padding tokens: 1 - rank: 6 max len: 64 min len: 63 avg len: 63.5 num_loss_counted_tokens: 58 | |
total tokens: 158 num samples: 2 num padding tokens: 5 - rank: 6 max len: 79 min len: 74 avg len: 76.5 num_loss_counted_tokens: 82 | |
total tokens: 152 num samples: 2 num padding tokens: 8 - rank: 6 max len: 76 min len: 68 avg len: 72.0 num_loss_counted_tokens: 72 | |
total tokens: 110 num samples: 2 num padding tokens: 10 - rank: 0 max len: 55 min len: 45 avg len: 50.0 num_loss_counted_tokens: 54 | |
total tokens: 120 num samples: 2 num padding tokens: 5 - rank: 4 max len: 60 min len: 55 avg len: 57.5 num_loss_counted_tokens: 66 | |
total tokens: 148 num samples: 2 num padding tokens: 16 - rank: 0 max len: 74 min len: 58 avg len: 66.0 num_loss_counted_tokens: 73 | |
total tokens: 134 num samples: 2 num padding tokens: 13 - rank: 6 max len: 67 min len: 54 avg len: 60.5 num_loss_counted_tokens: 67 | |
total tokens: 164 num samples: 2 num padding tokens: 21 - rank: 0 max len: 82 min len: 61 avg len: 71.5 num_loss_counted_tokens: 92 | |
total tokens: 140 num samples: 2 num padding tokens: 13 - rank: 3 max len: 70 min len: 57 avg len: 63.5 num_loss_counted_tokens: 67 | |
total tokens: 162 num samples: 2 num padding tokens: 27 - rank: 3 max len: 81 min len: 54 avg len: 67.5 num_loss_counted_tokens: 86 | |
total tokens: 102 num samples: 2 num padding tokens: 5 - rank: 0 max len: 51 min len: 46 avg len: 48.5 num_loss_counted_tokens: 49 | |
total tokens: 160 num samples: 2 num padding tokens: 31 - rank: 0 max len: 80 min len: 49 avg len: 64.5 num_loss_counted_tokens: 70 | |
total tokens: 146 num samples: 2 num padding tokens: 3 - rank: 0 max len: 73 min len: 70 avg len: 71.5 num_loss_counted_tokens: 87 | |
total tokens: 130 num samples: 2 num padding tokens: 5 - rank: 0 max len: 65 min len: 60 avg len: 62.5 num_loss_counted_tokens: 57 | |
total tokens: 152 num samples: 2 num padding tokens: 13 - rank: 0 max len: 76 min len: 63 avg len: 69.5 num_loss_counted_tokens: 71 | |
total tokens: 214 num samples: 2 num padding tokens: 26 - rank: 3 max len: 107 min len: 81 avg len: 94.0 num_loss_counted_tokens: 116 | |
total tokens: 196 num samples: 2 num padding tokens: 32 - rank: 3 max len: 98 min len: 66 avg len: 82.0 num_loss_counted_tokens: 101 | |
total tokens: 120 num samples: 2 num padding tokens: 8 - rank: 1 max len: 60 min len: 52 avg len: 56.0 num_loss_counted_tokens: 64 | |
total tokens: 166 num samples: 2 num padding tokens: 8 - rank: 6 max len: 83 min len: 75 avg len: 79.0 num_loss_counted_tokens: 77 | |
total tokens: 228 num samples: 2 num padding tokens: 54 - rank: 1 max len: 114 min len: 60 avg len: 87.0 num_loss_counted_tokens: 125 | |
total tokens: 152 num samples: 2 num padding tokens: 10 - rank: 2 max len: 76 min len: 66 avg len: 71.0 num_loss_counted_tokens: 77 total tokens: 126 num samples: 2 num padding tokens: 12 - rank: 2 max len: 63 min len: 51 avg len: 57.0 num_loss_counted_tokens: 56 | |
total tokens: 154 num samples: 2 num padding tokens: 24 - rank: 0 max len: 77 min len: 53 avg len: 65.0 num_loss_counted_tokens: 66 | |
total tokens: 146 num samples: 2 num padding tokens: 19 - rank: 3 max len: 73 min len: 54 avg len: 63.5 num_loss_counted_tokens: 71 | |
total tokens: 116 num samples: 2 num padding tokens: 10 - rank: 3 max len: 58 min len: 48 avg len: 53.0 num_loss_counted_tokens: 52 | |
total tokens: 138 num samples: 2 num padding tokens: 24 - rank: 4 max len: 69 min len: 45 avg len: 57.0 num_loss_counted_tokens: 68 | |
total tokens: 110 num samples: 2 num padding tokens: 3 - rank: 1 max len: 55 min len: 52 avg len: 53.5 num_loss_counted_tokens: 42 | |
total tokens: 132 num samples: 2 num padding tokens: 11 - rank: 2 max len: 66 min len: 55 avg len: 60.5 num_loss_counted_tokens: 51 | |
total tokens: 132 num samples: 2 num padding tokens: 5 - rank: 1 max len: 66 min len: 61 avg len: 63.5 num_loss_counted_tokens: 65 | |
total tokens: 216 num samples: 2 num padding tokens: 49 - rank: 1 max len: 108 min len: 59 avg len: 83.5 num_loss_counted_tokens: 103 | |
total tokens: 214 num samples: 2 num padding tokens: 21 - rank: 2 max len: 107 min len: 86 avg len: 96.5 num_loss_counted_tokens: 106 | |
total tokens: 174 num samples: 2 num padding tokens: 24 - rank: 2 max len: 87 min len: 63 avg len: 75.0 num_loss_counted_tokens: 83 | |
total tokens: 124 num samples: 2 num padding tokens: 7 - rank: 1 max len: 62 min len: 55 avg len: 58.5 num_loss_counted_tokens: 63 | |
total tokens: 164 num samples: 2 num padding tokens: 29 - rank: 1 max len: 82 min len: 53 avg len: 67.5 num_loss_counted_tokens: 83 | |
total tokens: 172 num samples: 2 num padding tokens: 16 - rank: 1 max len: 86 min len: 70 avg len: 78.0 num_loss_counted_tokens: 86 | |
total tokens: 168 num samples: 2 num padding tokens: 18 - rank: 2 max len: 84 min len: 66 avg len: 75.0 num_loss_counted_tokens: 81 | |
total tokens: 226 num samples: 2 num padding tokens: 44 - rank: 2 max len: 113 min len: 69 avg len: 91.0 num_loss_counted_tokens: 97 | |
total tokens: 180 num samples: 2 num padding tokens: 24 - rank: 2 max len: 90 min len: 66 avg len: 78.0 num_loss_counted_tokens: 103 | |
total tokens: 186 num samples: 2 num padding tokens: 48 - rank: 1 max len: 93 min len: 45 avg len: 69.0 num_loss_counted_tokens: 110 | |
total tokens: 208 num samples: 2 num padding tokens: 33 - rank: 2 max len: 104 min len: 71 avg len: 87.5 num_loss_counted_tokens: 113 | |
total tokens: 188 num samples: 2 num padding tokens: 32 - rank: 3 max len: 94 min len: 62 avg len: 78.0 num_loss_counted_tokens: 89 | |
total tokens: 116 num samples: 2 num padding tokens: 10 - rank: 2 max len: 58 min len: 48 avg len: 53.0 num_loss_counted_tokens: 62 | |
total tokens: 194 num samples: 2 num padding tokens: 48 - rank: 1 max len: 97 min len: 49 avg len: 73.0 num_loss_counted_tokens: 95 | |
total tokens: 120 num samples: 2 num padding tokens: 9 - rank: 2 max len: 60 min len: 51 avg len: 55.5 num_loss_counted_tokens: 56 | |
total tokens: 128 num samples: 2 num padding tokens: 9 - rank: 6 max len: 64 min len: 55 avg len: 59.5 num_loss_counted_tokens: 69 | |
total tokens: 162 num samples: 2 num padding tokens: 17 - rank: 2 max len: 81 min len: 64 avg len: 72.5 num_loss_counted_tokens: 91 | |
total tokens: 132 num samples: 2 num padding tokens: 2 - rank: 5 max len: 66 min len: 64 avg len: 65.0 num_loss_counted_tokens: 70 | |
total tokens: 120 num samples: 2 num padding tokens: 6 - rank: 5 max len: 60 min len: 54 avg len: 57.0 num_loss_counted_tokens: 87 | |
total tokens: 142 num samples: 2 num padding tokens: 5 - rank: 5 max len: 71 min len: 66 avg len: 68.5 num_loss_counted_tokens: 72 | |
total tokens: 172 num samples: 2 num padding tokens: 36 - rank: 5 max len: 86 min len: 50 avg len: 68.0 num_loss_counted_tokens: 70 | |
total tokens: 104 num samples: 2 num padding tokens: 4 - rank: 5 max len: 52 min len: 48 avg len: 50.0 num_loss_counted_tokens: 54 | |
total tokens: 186 num samples: 2 num padding tokens: 41 - rank: 5 max len: 93 min len: 52 avg len: 72.5 num_loss_counted_tokens: 81 | |
total tokens: 174 num samples: 2 num padding tokens: 19 - rank: 5 max len: 87 min len: 68 avg len: 77.5 num_loss_counted_tokens: 80 | |
total tokens: 202 num samples: 2 num padding tokens: 18 - rank: 5 max len: 101 min len: 83 avg len: 92.0 num_loss_counted_tokens: 124 | |
total tokens: 122 num samples: 2 num padding tokens: 11 - rank: 5 max len: 61 min len: 50 avg len: 55.5 num_loss_counted_tokens: 65 | |
total tokens: 174 num samples: 2 num padding tokens: 15 - rank: 5 max len: 87 min len: 72 avg len: 79.5 num_loss_counted_tokens: 98 total tokens: 160 num samples: 2 num padding tokens: 7 - rank: 5 max len: 80 min len: 73 avg len: 76.5 num_loss_counted_tokens: 104 | |
total tokens: 124 num samples: 2 num padding tokens: 1 - rank: 5 max len: 62 min len: 61 avg len: 61.5 num_loss_counted_tokens: 59 | |
Per-token loss scaled by world size: 0.0017695350106805563Per-token loss scaled by world size: 0.0008982627186924219 | |
Epoch: 0, Step: 1, Rank: 0, loss = 0.12254030257463455 | |
Epoch: 0, Step: 1, Rank: 6, loss = 0.062204692512750626 | |
Per-token loss scaled by world size: 0.0019191226456314325 | |
Epoch: 0, Step: 1, Rank: 7, loss = 0.1328992396593094 | |
Per-token loss scaled by world size: 0.002525273710489273 | |
Epoch: 0, Step: 1, Rank: 3, loss = 0.1748751997947693 | |
Per-token loss scaled by world size: 0.002455754904076457 | |
Epoch: 0, Step: 1, Rank: 4, loss = 0.17006102204322815 | |
Per-token loss scaled by world size: 0.0006225108518265188 | |
Epoch: 0, Step: 1, Rank: 2, loss = 0.043108876794576645 | |
Per-token loss scaled by world size: 0.004392423201352358 | |
Epoch: 0, Step: 1, Rank: 5, loss = 0.30417531728744507 | |
Per-token loss scaled by world size: 0.0016390715027227998 | |
Epoch: 0, Step: 1, Rank: 1, loss = 0.11350569874048233 | |
[2024-07-27 20:04:03,637] [INFO] [logging.py:96:log_dist] [Rank 0] step=1, skipped=0, lr=[8.000000000000001e-07], mom=[(0.9, 0.95)] | |
Epoch 0: 8%|▊ | 1/12 [00:01<00:13, 1.27s/it]{ | |
"epoch": 0, | |
"step": 1, | |
"rank": 0, | |
"loss": 0.12254030257463455, | |
"overall_throughput": 19.157316171863247, | |
"lr": 8.000000000000001e-07, | |
"cuda_mem_allocated": 21.99594497680664, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 554, | |
"batch_size": 16, | |
"total_loss": 0.1404212862253189, | |
"gradnorm": 2.950925588607788, | |
"weight_norm": 393.455078125, | |
"timestamp": "2024-07-27T20:04:03.739792" | |
} | |
Per-token loss scaled by world size: 0.001750853261910379Per-token loss scaled by world size: 0.0013746594777330756Per-token loss scaled by world size: 0.006612943951040506 | |
Per-token loss scaled by world size: 0.00497298501431942 | |
Per-token loss scaled by world size: 0.0029370656702667475Per-token loss scaled by world size: 0.00024299396318383515 | |
Epoch: 0, Step: 2, Rank: 2, loss = 0.09450783580541611 | |
Epoch: 0, Step: 2, Rank: 1, loss = 0.12037116289138794 | |
Epoch: 0, Step: 2, Rank: 6, loss = 0.45463991165161133 | |
Epoch: 0, Step: 2, Rank: 5, loss = 0.34189271926879883 | |
Epoch: 0, Step: 2, Rank: 3, loss = 0.20192326605319977 | |
Epoch: 0, Step: 2, Rank: 4, loss = 0.016705835238099098 | |
Per-token loss scaled by world size: 0.0014473804039880633 | |
Per-token loss scaled by world size: 0.0009000621503219008 | |
Epoch: 0, Step: 2, Rank: 0, loss = 0.0995073989033699 | |
Epoch: 0, Step: 2, Rank: 7, loss = 0.061879273504018784 | |
[2024-07-27 20:04:04,156] [INFO] [logging.py:96:log_dist] [Rank 0] step=2, skipped=0, lr=[1.6000000000000001e-06], mom=[(0.9, 0.95)] | |
Epoch 0: 17%|█▋ | 2/12 [00:01<00:08, 1.20it/s]{ | |
"epoch": 0, | |
"step": 2, | |
"rank": 0, | |
"loss": 0.0995073989033699, | |
"overall_throughput": 38.74044042813212, | |
"lr": 1.6000000000000001e-06, | |
"cuda_mem_allocated": 21.998329639434814, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 550, | |
"batch_size": 16, | |
"total_loss": 0.1739284247159958, | |
"gradnorm": 4.509922981262207, | |
"weight_norm": 393.4551086425781, | |
"timestamp": "2024-07-27T20:04:04.235419" | |
} | |
Per-token loss scaled by world size: 0.0008535730303265154Per-token loss scaled by world size: 0.0017103212885558605Per-token loss scaled by world size: 0.003972221631556749 | |
Per-token loss scaled by world size: 0.0014537357492372394Per-token loss scaled by world size: 0.0024689952842891216Per-token loss scaled by world size: 0.0007754238904453814 | |
Epoch: 0, Step: 3, Rank: 1, loss = 0.3440936803817749 | |
Epoch: 0, Step: 3, Rank: 6, loss = 0.14815658330917358 | |
Epoch: 0, Step: 3, Rank: 2, loss = 0.07394076138734818 | |
Epoch: 0, Step: 3, Rank: 5, loss = 0.12592986226081848 | |
Epoch: 0, Step: 3, Rank: 7, loss = 0.21387672424316406 | |
Per-token loss scaled by world size: 0.0015765568241477013Epoch: 0, Step: 3, Rank: 4, loss = 0.06717109680175781 | |
Per-token loss scaled by world size: 0.0009588321554474533 | |
Epoch: 0, Step: 3, Rank: 0, loss = 0.08305883407592773 | |
Epoch: 0, Step: 3, Rank: 3, loss = 0.13656923174858093 | |
[2024-07-27 20:04:04,706] [INFO] [logging.py:96:log_dist] [Rank 0] step=3, skipped=0, lr=[2.4000000000000003e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:04,785] [INFO] [timer.py:258:stop] epoch=0/micro_step=3/global_step=3, RunningAvgSamplesPerSec=31.96463872821357, CurrSamplesPerSec=31.96463872821357, MemAllocated=22.0GB, MaxMemAllocated=28.29GB | |
Epoch 0: 25%|██▌ | 3/12 [00:02<00:06, 1.42it/s]{ | |
"epoch": 0, | |
"step": 3, | |
"rank": 0, | |
"loss": 0.08305883407592773, | |
"overall_throughput": 31.90019931426007, | |
"lr": 2.4000000000000003e-06, | |
"cuda_mem_allocated": 21.998568058013916, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 693, | |
"batch_size": 16, | |
"total_loss": 0.14909958839416504, | |
"gradnorm": 4.072885990142822, | |
"weight_norm": 393.4551696777344, | |
"timestamp": "2024-07-27T20:04:04.830984" | |
} | |
Per-token loss scaled by world size: 0.0011740017216652632Per-token loss scaled by world size: 0.001550567802041769Per-token loss scaled by world size: 0.002323366003111005Per-token loss scaled by world size: 0.0024958737194538116 | |
Per-token loss scaled by world size: 0.0009869500063359737 | |
Per-token loss scaled by world size: 0.0016128732822835445 | |
Per-token loss scaled by world size: 0.0012467901688069105 | |
Epoch: 0, Step: 4, Rank: 6, loss = 0.1122223436832428 | |
Epoch: 0, Step: 4, Rank: 5, loss = 0.16815361380577087 | |
Epoch: 0, Step: 4, Rank: 1, loss = 0.08496837317943573 | |
Epoch: 0, Step: 4, Rank: 3, loss = 0.1806388646364212Epoch: 0, Step: 4, Rank: 2, loss = 0.071430504322052 | |
Epoch: 0, Step: 4, Rank: 4, loss = 0.11673170328140259 | |
Epoch: 0, Step: 4, Rank: 7, loss = 0.09023644030094147 | |
Per-token loss scaled by world size: 0.0015552268596366048 | |
Epoch: 0, Step: 4, Rank: 0, loss = 0.11255954205989838 | |
[2024-07-27 20:04:05,257] [INFO] [logging.py:96:log_dist] [Rank 0] step=4, skipped=0, lr=[3.2000000000000003e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:05,334] [INFO] [timer.py:258:stop] epoch=0/micro_step=4/global_step=4, RunningAvgSamplesPerSec=32.182993941908016, CurrSamplesPerSec=32.404352908739476, MemAllocated=22.0GB, MaxMemAllocated=28.29GB | |
Epoch 0: 33%|███▎ | 4/12 [00:02<00:05, 1.55it/s]{ | |
"epoch": 0, | |
"step": 4, | |
"rank": 0, | |
"loss": 0.11255954205989838, | |
"overall_throughput": 32.34236936565279, | |
"lr": 3.2000000000000003e-06, | |
"cuda_mem_allocated": 21.996421813964844, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 579, | |
"batch_size": 16, | |
"total_loss": 0.11711767315864563, | |
"gradnorm": 2.631800889968872, | |
"weight_norm": 393.4552001953125, | |
"timestamp": "2024-07-27T20:04:05.382886" | |
} | |
Per-token loss scaled by world size: 0.002091767033562064Per-token loss scaled by world size: 0.00198046350851655Per-token loss scaled by world size: 0.0025131492875516415Per-token loss scaled by world size: 0.0008698466117493808Per-token loss scaled by world size: 0.001278402516618371 | |
Per-token loss scaled by world size: 0.0014365998795256019 | |
Per-token loss scaled by world size: 0.0012164696818217635 | |
Epoch: 0, Step: 5, Rank: 5, loss = 0.1651211529970169Epoch: 0, Step: 5, Rank: 4, loss = 0.20953382551670074 | |
Epoch: 0, Step: 5, Rank: 2, loss = 0.07252345979213715 | |
Epoch: 0, Step: 5, Rank: 6, loss = 0.17440107464790344 | |
Epoch: 0, Step: 5, Rank: 0, loss = 0.10658681392669678 | |
Epoch: 0, Step: 5, Rank: 3, loss = 0.11977651715278625 | |
Epoch: 0, Step: 5, Rank: 1, loss = 0.10142315924167633 | |
Per-token loss scaled by world size: 0.000835251237731427 | |
Epoch: 0, Step: 5, Rank: 7, loss = 0.06963907182216644 | |
[2024-07-27 20:04:05,832] [INFO] [logging.py:96:log_dist] [Rank 0] step=5, skipped=0, lr=[4.000000000000001e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:05,909] [INFO] [timer.py:258:stop] epoch=0/micro_step=5/global_step=5, RunningAvgSamplesPerSec=31.583034761692534, CurrSamplesPerSec=30.44781135920859, MemAllocated=22.0GB, MaxMemAllocated=28.29GB | |
Epoch 0: 42%|████▏ | 5/12 [00:03<00:04, 1.62it/s]{ | |
"epoch": 0, | |
"step": 5, | |
"rank": 0, | |
"loss": 0.10658681392669678, | |
"overall_throughput": 30.372502350050485, | |
"lr": 4.000000000000001e-06, | |
"cuda_mem_allocated": 22.000954627990723, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 667, | |
"batch_size": 16, | |
"total_loss": 0.12737563252449036, | |
"gradnorm": 2.452970027923584, | |
"weight_norm": 393.45526123046875, | |
"timestamp": "2024-07-27T20:04:05.943274" | |
} | |
Per-token loss scaled by world size: 0.002868585754185915Per-token loss scaled by world size: 0.00250813621096313Per-token loss scaled by world size: 0.0015482519520446658Per-token loss scaled by world size: 0.0010162107646465302Per-token loss scaled by world size: 0.0008416934870183468Per-token loss scaled by world size: 0.002133122645318508 | |
Per-token loss scaled by world size: 0.001864621532149613 | |
Epoch: 0, Step: 6, Rank: 6, loss = 0.05828727409243584Epoch: 0, Step: 6, Rank: 4, loss = 0.1986495554447174 | |
Epoch: 0, Step: 6, Rank: 2, loss = 0.10721644759178162Epoch: 0, Step: 6, Rank: 5, loss = 0.07037259638309479 | |
Epoch: 0, Step: 6, Rank: 3, loss = 0.17368842661380768 | |
Epoch: 0, Step: 6, Rank: 0, loss = 0.14771874248981476 | |
Epoch: 0, Step: 6, Rank: 7, loss = 0.12912504374980927 | |
Per-token loss scaled by world size: 0.0008939910912886262 | |
Epoch: 0, Step: 6, Rank: 1, loss = 0.06190888211131096 | |
[2024-07-27 20:04:06,381] [INFO] [logging.py:96:log_dist] [Rank 0] step=6, skipped=0, lr=[4.800000000000001e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:06,460] [INFO] [timer.py:258:stop] epoch=0/micro_step=6/global_step=6, RunningAvgSamplesPerSec=31.51022949423371, CurrSamplesPerSec=31.293813829665694, MemAllocated=22.0GB, MaxMemAllocated=28.29GB | |
Epoch 0: 50%|█████ | 6/12 [00:04<00:03, 1.68it/s]{ | |
"epoch": 0, | |
"step": 6, | |
"rank": 0, | |
"loss": 0.14771874248981476, | |
"overall_throughput": 31.243491527428024, | |
"lr": 4.800000000000001e-06, | |
"cuda_mem_allocated": 21.996244430541992, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 554, | |
"batch_size": 16, | |
"total_loss": 0.11837086826562881, | |
"gradnorm": 2.045849323272705, | |
"weight_norm": 393.4552917480469, | |
"timestamp": "2024-07-27T20:04:06.510648" | |
} | |
Per-token loss scaled by world size: 0.001680628745816648Per-token loss scaled by world size: 0.0018293416360393167Per-token loss scaled by world size: 0.0013819513842463493Per-token loss scaled by world size: 0.0005512057687155902Per-token loss scaled by world size: 0.0015243319794535637 | |
Per-token loss scaled by world size: 0.0011020175879821181Per-token loss scaled by world size: 0.001964986091479659 | |
Epoch: 0, Step: 7, Rank: 5, loss = 0.11867507547140121Epoch: 0, Step: 7, Rank: 7, loss = 0.15709471702575684 | |
Epoch: 0, Step: 7, Rank: 6, loss = 0.09463576227426529Epoch: 0, Step: 7, Rank: 4, loss = 0.1309020072221756 | |
Epoch: 0, Step: 7, Rank: 2, loss = 0.16874317824840546Epoch: 0, Step: 7, Rank: 1, loss = 0.14432398974895477 | |
Epoch: 0, Step: 7, Rank: 0, loss = 0.04733479768037796 | |
Per-token loss scaled by world size: 0.0013999826041981578 | |
Epoch: 0, Step: 7, Rank: 3, loss = 0.1202235072851181 | |
[2024-07-27 20:04:06,935] [INFO] [logging.py:96:log_dist] [Rank 0] step=7, skipped=0, lr=[5.600000000000001e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:07,012] [INFO] [timer.py:258:stop] epoch=0/micro_step=7/global_step=7, RunningAvgSamplesPerSec=31.621939391152157, CurrSamplesPerSec=32.07681358232997, MemAllocated=22.0GB, MaxMemAllocated=28.29GB | |
Epoch 0: 58%|█████▊ | 7/12 [00:04<00:02, 1.72it/s]{ | |
"epoch": 0, | |
"step": 7, | |
"rank": 0, | |
"loss": 0.04733479768037796, | |
"overall_throughput": 32.023072654142574, | |
"lr": 5.600000000000001e-06, | |
"cuda_mem_allocated": 21.99880838394165, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 687, | |
"batch_size": 16, | |
"total_loss": 0.12274163216352463, | |
"gradnorm": 2.7485461235046387, | |
"weight_norm": 393.4553527832031, | |
"timestamp": "2024-07-27T20:04:07.057260" | |
} | |
Per-token loss scaled by world size: 0.0043866513296961784Per-token loss scaled by world size: 0.0009822045685723424Per-token loss scaled by world size: 0.003587431972846389Per-token loss scaled by world size: 0.002129745902493596Per-token loss scaled by world size: 0.0025288627948611975Per-token loss scaled by world size: 0.0017483173869550228 | |
Per-token loss scaled by world size: 0.0015334823401644826 | |
Epoch: 0, Step: 8, Rank: 6, loss = 0.2569498121738434 | |
Epoch: 0, Step: 8, Rank: 3, loss = 0.1811297982931137Epoch: 0, Step: 8, Rank: 0, loss = 0.07035040110349655Epoch: 0, Step: 8, Rank: 4, loss = 0.12522323429584503 | |
Epoch: 0, Step: 8, Rank: 5, loss = 0.3141939043998718 | |
Epoch: 0, Step: 8, Rank: 1, loss = 0.1525430530309677 | |
Epoch: 0, Step: 8, Rank: 7, loss = 0.1098356693983078 | |
Per-token loss scaled by world size: 0.004045259207487106 | |
Epoch: 0, Step: 8, Rank: 2, loss = 0.2897416949272156 | |
[2024-07-27 20:04:07,479] [INFO] [logging.py:96:log_dist] [Rank 0] step=8, skipped=0, lr=[6.4000000000000006e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:07,557] [INFO] [timer.py:258:stop] epoch=0/micro_step=8/global_step=8, RunningAvgSamplesPerSec=31.719761022460865, CurrSamplesPerSec=32.21809006047175, MemAllocated=22.0GB, MaxMemAllocated=28.29GB | |
Epoch 0: 67%|██████▋ | 8/12 [00:05<00:02, 1.76it/s]{ | |
"epoch": 0, | |
"step": 8, | |
"rank": 0, | |
"loss": 0.07035040110349655, | |
"overall_throughput": 32.162302804268634, | |
"lr": 6.4000000000000006e-06, | |
"cuda_mem_allocated": 22.001669883728027, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 573, | |
"batch_size": 16, | |
"total_loss": 0.18749594688415527, | |
"gradnorm": 3.855632781982422, | |
"weight_norm": 393.45538330078125, | |
"timestamp": "2024-07-27T20:04:07.599450" | |
} | |
Per-token loss scaled by world size: 0.0026937518268823624Per-token loss scaled by world size: 0.0037981150671839714 | |
Per-token loss scaled by world size: 0.0015854539815336466Per-token loss scaled by world size: 0.002551022917032242Per-token loss scaled by world size: 0.0024539916776120663Per-token loss scaled by world size: 0.002055267570540309 | |
Epoch: 0, Step: 9, Rank: 0, loss = 0.2181939035654068 | |
Epoch: 0, Step: 9, Rank: 2, loss = 0.1987733244895935Epoch: 0, Step: 9, Rank: 3, loss = 0.30764731764793396 | |
Epoch: 0, Step: 9, Rank: 4, loss = 0.16647666692733765Epoch: 0, Step: 9, Rank: 7, loss = 0.2066328525543213 | |
Epoch: 0, Step: 9, Rank: 1, loss = 0.12842176854610443 | |
Per-token loss scaled by world size: 0.0031997160986065865 | |
Per-token loss scaled by world size: 0.002269922522827983 | |
Epoch: 0, Step: 9, Rank: 5, loss = 0.25917699933052063 | |
Epoch: 0, Step: 9, Rank: 6, loss = 0.18386372923851013 | |
[2024-07-27 20:04:08,019] [INFO] [logging.py:96:log_dist] [Rank 0] step=9, skipped=0, lr=[7.2000000000000005e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:08,097] [INFO] [timer.py:258:stop] epoch=0/micro_step=9/global_step=9, RunningAvgSamplesPerSec=31.789200407919928, CurrSamplesPerSec=32.21230625969002, MemAllocated=22.0GB, MaxMemAllocated=28.29GB | |
Epoch 0: 75%|███████▌ | 9/12 [00:05<00:01, 1.78it/s]{ | |
"epoch": 0, | |
"step": 9, | |
"rank": 0, | |
"loss": 0.2181939035654068, | |
"overall_throughput": 32.12612073517451, | |
"lr": 7.2000000000000005e-06, | |
"cuda_mem_allocated": 22.002385139465332, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 648, | |
"batch_size": 16, | |
"total_loss": 0.20864830911159515, | |
"gradnorm": 35.085845947265625, | |
"weight_norm": 393.4554138183594, | |
"timestamp": "2024-07-27T20:04:08.140249" | |
} | |
Per-token loss scaled by world size: 0.0028963200747966766Per-token loss scaled by world size: 0.0014150061178952456Per-token loss scaled by world size: 0.004510107450187206Per-token loss scaled by world size: 0.0027439305558800697Per-token loss scaled by world size: 0.003027191385626793Per-token loss scaled by world size: 0.002273061079904437 | |
Per-token loss scaled by world size: 0.0028788307681679726 | |
Epoch: 0, Step: 10, Rank: 5, loss = 0.2164275199174881 | |
Epoch: 0, Step: 10, Rank: 2, loss = 0.3557347357273102 | |
Epoch: 0, Step: 10, Rank: 6, loss = 0.1116086095571518 | |
Epoch: 0, Step: 10, Rank: 0, loss = 0.23876972496509552 | |
Epoch: 0, Step: 10, Rank: 1, loss = 0.22844724357128143 | |
Epoch: 0, Step: 10, Rank: 3, loss = 0.22706778347492218Epoch: 0, Step: 10, Rank: 4, loss = 0.17928770184516907 | |
Per-token loss scaled by world size: 0.004227162804454565 | |
Epoch: 0, Step: 10, Rank: 7, loss = 0.33341747522354126 | |
[2024-07-27 20:04:08,566] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=0, lr=[8.000000000000001e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:08,643] [INFO] [timer.py:258:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=31.799625325743182, CurrSamplesPerSec=31.872791640267828, MemAllocated=22.0GB, MaxMemAllocated=28.29GB | |
Epoch 0: 83%|████████▎ | 10/12 [00:06<00:01, 1.80it/s]{ | |
"epoch": 0, | |
"step": 10, | |
"rank": 0, | |
"loss": 0.23876972496509552, | |
"overall_throughput": 31.789585477820573, | |
"lr": 8.000000000000001e-06, | |
"cuda_mem_allocated": 22.002862453460693, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 631, | |
"batch_size": 16, | |
"total_loss": 0.23634512722492218, | |
"gradnorm": 5.703427791595459, | |
"weight_norm": 393.4554748535156, | |
"timestamp": "2024-07-27T20:04:08.686452" | |
} | |
Per-token loss scaled by world size: 0.0033281107898801565Per-token loss scaled by world size: 0.0010645152069628239Per-token loss scaled by world size: 0.004243766888976097Per-token loss scaled by world size: 0.003650533501058817Per-token loss scaled by world size: 0.0036266690585762262Per-token loss scaled by world size: 0.0018828274914994836 | |
Per-token loss scaled by world size: 0.0036798259243369102 | |
Epoch: 0, Step: 11, Rank: 4, loss = 0.07890719175338745 | |
Epoch: 0, Step: 11, Rank: 6, loss = 0.2705957889556885Epoch: 0, Step: 11, Rank: 2, loss = 0.3145692050457001 | |
Epoch: 0, Step: 11, Rank: 1, loss = 0.26882684230804443 | |
Epoch: 0, Step: 11, Rank: 5, loss = 0.2727670967578888Epoch: 0, Step: 11, Rank: 0, loss = 0.24669620394706726 | |
Epoch: 0, Step: 11, Rank: 3, loss = 0.13956458866596222 | |
Per-token loss scaled by world size: 0.002425282960757613 | |
Epoch: 0, Step: 11, Rank: 7, loss = 0.17977410554885864 | |
[2024-07-27 20:04:09,124] [INFO] [logging.py:96:log_dist] [Rank 0] step=11, skipped=0, lr=[8.8e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:09,202] [INFO] [timer.py:258:stop] epoch=0/micro_step=11/global_step=11, RunningAvgSamplesPerSec=31.7205989700859, CurrSamplesPerSec=31.10225264577545, MemAllocated=22.0GB, MaxMemAllocated=28.29GB | |
Saving model in huggingface format at samples_seen: 176 | |
{ | |
"epoch": 0, | |
"step": 11, | |
"rank": 0, | |
"loss": 0.24669620394706726, | |
"overall_throughput": 31.02962868774954, | |
"lr": 8.8e-06, | |
"cuda_mem_allocated": 22.00071620941162, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 593, | |
"batch_size": 16, | |
"total_loss": 0.22146263718605042, | |
"gradnorm": 4.970978736877441, | |
"weight_norm": 393.4555358886719, | |
"timestamp": "2024-07-27T20:04:09.205377" | |
} | |
Model saved in /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_176 | |
[20:04:26] INFO saving took 17.77474355697632 seconds utils.py:611 | |
Epoch 0: 92%|█████████▏| 11/12 [00:24<00:05, 6.00s/it]Per-token loss scaled by world size: 0.0032385066151618958Per-token loss scaled by world size: 0.0007423295173794031Per-token loss scaled by world size: 0.004228494130074978Per-token loss scaled by world size: 0.002175833098590374 | |
Per-token loss scaled by world size: 0.0016533531015738845 | |
Per-token loss scaled by world size: 0.0016122134402394295 | |
Per-token loss scaled by world size: 0.0011377736227586865 | |
Epoch: 0, Step: 12, Rank: 2, loss = 0.3493793308734894Epoch: 0, Step: 12, Rank: 5, loss = 0.06133497506380081Epoch: 0, Step: 12, Rank: 6, loss = 0.17977821826934814Epoch: 0, Step: 12, Rank: 0, loss = 0.26758161187171936 | |
Epoch: 0, Step: 12, Rank: 1, loss = 0.13320913910865784Epoch: 0, Step: 12, Rank: 3, loss = 0.1366083025932312 | |
Epoch: 0, Step: 12, Rank: 7, loss = 0.09400854259729385 | |
Per-token loss scaled by world size: 0.0017149074701592326 | |
Epoch: 0, Step: 12, Rank: 4, loss = 0.14169423282146454 | |
[2024-07-27 20:04:27,462] [INFO] [logging.py:96:log_dist] [Rank 0] step=12, skipped=0, lr=[9.600000000000001e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:27,540] [INFO] [timer.py:258:stop] epoch=0/micro_step=12/global_step=12, RunningAvgSamplesPerSec=31.65103378148296, CurrSamplesPerSec=31.038411783233425, MemAllocated=22.0GB, MaxMemAllocated=28.29GB | |
Epoch 0: 100%|██████████| 12/12 [00:25<00:00, 4.34s/it]{ | |
"epoch": 0, | |
"step": 12, | |
"rank": 0, | |
"loss": 0.26758161187171936, | |
"overall_throughput": 30.984142462059825, | |
"lr": 9.600000000000001e-06, | |
"cuda_mem_allocated": 22.001431465148926, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 661, | |
"batch_size": 16, | |
"total_loss": 0.17044928669929504, | |
"gradnorm": 3.941415309906006, | |
"weight_norm": 393.4555969238281, | |
"timestamp": "2024-07-27T20:04:27.583911" | |
} | |
Epoch 0: 100%|██████████| 12/12 [00:25<00:00, 2.10s/it] | |
total tokens: 164 num samples: 2 num padding tokens: 22 - rank: 5 max len: 82 min len: 60 avg len: 71.0 num_loss_counted_tokens: 84 | |
total tokens: 186 num samples: 2 num padding tokens: 38 - rank: 5 max len: 93 min len: 55 avg len: 74.0 num_loss_counted_tokens: 82 | |
total tokens: 152 num samples: 2 num padding tokens: 28 - rank: 5 max len: 76 min len: 48 avg len: 62.0 num_loss_counted_tokens: 69 | |
total tokens: 200 num samples: 2 num padding tokens: 36 - rank: 5 max len: 100 min len: 64 avg len: 82.0 num_loss_counted_tokens: 102 | |
total tokens: 196 num samples: 2 num padding tokens: 28 - rank: 5 max len: 98 min len: 70 avg len: 84.0 num_loss_counted_tokens: 101 | |
total tokens: 214 num samples: 2 num padding tokens: 42 - rank: 5 max len: 107 min len: 65 avg len: 86.0 num_loss_counted_tokens: 106 | |
total tokens: 138 num samples: 2 num padding tokens: 12 - rank: 5 max len: 69 min len: 57 avg len: 63.0 num_loss_counted_tokens: 63 | |
total tokens: 202 num samples: 2 num padding tokens: 39 - rank: 5 max len: 101 min len: 62 avg len: 81.5 num_loss_counted_tokens: 105 | |
total tokens: 104 num samples: 2 num padding tokens: 0 - rank: 5 max len: 52 min len: 52 avg len: 52.0 num_loss_counted_tokens: 50 | |
total tokens: 180 num samples: 2 num padding tokens: 31 - rank: 5 max len: 90 min len: 59 avg len: 74.5 num_loss_counted_tokens: 95 | |
total tokens: 174 num samples: 2 num padding tokens: 7 - rank: 5 max len: 87 min len: 80 avg len: 83.5 num_loss_counted_tokens: 97 | |
total tokens: 140 num samples: 2 num padding tokens: 15 - rank: 5 max len: 70 min len: 55 avg len: 62.5 num_loss_counted_tokens: 72 | |
total tokens: 146 num samples: 2 num padding tokens: 16 - rank: 2 max len: 73 min len: 57 avg len: 65.0 num_loss_counted_tokens: 75 | |
total tokens: 214 num samples: 2 num padding tokens: 53 - rank: 2 max len: 107 min len: 54 avg len: 80.5 num_loss_counted_tokens: 108 | |
total tokens: 136 num samples: 2 num padding tokens: 8 - rank: 2 max len: 68 min len: 60 avg len: 64.0 num_loss_counted_tokens: 62 | |
total tokens: 154 num samples: 2 num padding tokens: 13 - rank: 2 max len: 77 min len: 64 avg len: 70.5 num_loss_counted_tokens: 80 | |
total tokens: 180 num samples: 2 num padding tokens: 30 - rank: 7 max len: 90 min len: 60 avg len: 75.0 num_loss_counted_tokens: 119 | |
total tokens: 130 num samples: 2 num padding tokens: 7 - rank: 2 max len: 65 min len: 58 avg len: 61.5 num_loss_counted_tokens: 59 | |
total tokens: 194 num samples: 2 num padding tokens: 23 - rank: 0 max len: 97 min len: 74 avg len: 85.5 num_loss_counted_tokens: 109 | |
total tokens: 116 num samples: 2 num padding tokens: 9 - rank: 2 max len: 58 min len: 49 avg len: 53.5 num_loss_counted_tokens: 57 | |
total tokens: 118 num samples: 2 num padding tokens: 5 - rank: 2 max len: 59 min len: 54 avg len: 56.5 num_loss_counted_tokens: 71 | |
total tokens: 136 num samples: 2 num padding tokens: 13 - rank: 7 max len: 68 min len: 55 avg len: 61.5 num_loss_counted_tokens: 47 | |
total tokens: 120 num samples: 2 num padding tokens: 7 - rank: 7 max len: 60 min len: 53 avg len: 56.5 num_loss_counted_tokens: 62 | |
total tokens: 150 num samples: 2 num padding tokens: 29 - rank: 2 max len: 75 min len: 46 avg len: 60.5 num_loss_counted_tokens: 64 | |
total tokens: 142 num samples: 2 num padding tokens: 13 - rank: 0 max len: 71 min len: 58 avg len: 64.5 num_loss_counted_tokens: 78 | |
total tokens: 140 num samples: 2 num padding tokens: 10 - rank: 2 max len: 70 min len: 60 avg len: 65.0 num_loss_counted_tokens: 73 | |
total tokens: 136 num samples: 2 num padding tokens: 23 - rank: 0 max len: 68 min len: 45 avg len: 56.5 num_loss_counted_tokens: 51 | |
total tokens: 132 num samples: 2 num padding tokens: 14 - rank: 2 max len: 66 min len: 52 avg len: 59.0 num_loss_counted_tokens: 58 | |
total tokens: 128 num samples: 2 num padding tokens: 4 - rank: 2 max len: 64 min len: 60 avg len: 62.0 num_loss_counted_tokens: 70 | |
total tokens: 114 num samples: 2 num padding tokens: 2 - rank: 4 max len: 57 min len: 55 avg len: 56.0 num_loss_counted_tokens: 66 | |
total tokens: 110 num samples: 2 num padding tokens: 10 - rank: 4 max len: 55 min len: 45 avg len: 50.0 num_loss_counted_tokens: 49 | |
total tokens: 166 num samples: 2 num padding tokens: 7 - rank: 0 max len: 83 min len: 76 avg len: 79.5 num_loss_counted_tokens: 90 | |
total tokens: 188 num samples: 2 num padding tokens: 22 - rank: 4 max len: 94 min len: 72 avg len: 83.0 num_loss_counted_tokens: 98 | |
total tokens: 156 num samples: 2 num padding tokens: 27 - rank: 7 max len: 78 min len: 51 avg len: 64.5 num_loss_counted_tokens: 71 | |
total tokens: 118 num samples: 2 num padding tokens: 8 - rank: 0 max len: 59 min len: 51 avg len: 55.0 num_loss_counted_tokens: 60 | |
total tokens: 140 num samples: 2 num padding tokens: 10 - rank: 4 max len: 70 min len: 60 avg len: 65.0 num_loss_counted_tokens: 59 | |
total tokens: 166 num samples: 2 num padding tokens: 16 - rank: 7 max len: 83 min len: 67 avg len: 75.0 num_loss_counted_tokens: 75 | |
total tokens: 168 num samples: 2 num padding tokens: 13 - rank: 4 max len: 84 min len: 71 avg len: 77.5 num_loss_counted_tokens: 88 | |
total tokens: 174 num samples: 2 num padding tokens: 41 - rank: 3 max len: 87 min len: 46 avg len: 66.5 num_loss_counted_tokens: 70 | |
total tokens: 142 num samples: 2 num padding tokens: 5 - rank: 4 max len: 71 min len: 66 avg len: 68.5 num_loss_counted_tokens: 68 | |
total tokens: 152 num samples: 2 num padding tokens: 16 - rank: 4 max len: 76 min len: 60 avg len: 68.0 num_loss_counted_tokens: 84 | |
total tokens: 174 num samples: 2 num padding tokens: 6 - rank: 4 max len: 87 min len: 81 avg len: 84.0 num_loss_counted_tokens: 100 | |
total tokens: 104 num samples: 2 num padding tokens: 8 - rank: 4 max len: 52 min len: 44 avg len: 48.0 num_loss_counted_tokens: 52 | |
total tokens: 152 num samples: 2 num padding tokens: 12 - rank: 3 max len: 76 min len: 64 avg len: 70.0 num_loss_counted_tokens: 81 | |
total tokens: 128 num samples: 2 num padding tokens: 14 - rank: 4 max len: 64 min len: 50 avg len: 57.0 num_loss_counted_tokens: 51 | |
total tokens: 180 num samples: 2 num padding tokens: 9 - rank: 3 max len: 90 min len: 81 avg len: 85.5 num_loss_counted_tokens: 135 | |
total tokens: 154 num samples: 2 num padding tokens: 23 - rank: 2 max len: 77 min len: 54 avg len: 65.5 num_loss_counted_tokens: 75 | |
total tokens: 132 num samples: 2 num padding tokens: 4 - rank: 3 max len: 66 min len: 62 avg len: 64.0 num_loss_counted_tokens: 57 | |
total tokens: 122 num samples: 2 num padding tokens: 3 - rank: 3 max len: 61 min len: 58 avg len: 59.5 num_loss_counted_tokens: 60 | |
total tokens: 142 num samples: 2 num padding tokens: 11 - rank: 3 max len: 71 min len: 60 avg len: 65.5 num_loss_counted_tokens: 80 | |
total tokens: 124 num samples: 2 num padding tokens: 5 - rank: 0 max len: 62 min len: 57 avg len: 59.5 num_loss_counted_tokens: 73 | |
total tokens: 244 num samples: 2 num padding tokens: 34 - rank: 3 max len: 122 min len: 88 avg len: 105.0 num_loss_counted_tokens: 147 | |
total tokens: 186 num samples: 2 num padding tokens: 30 - rank: 7 max len: 93 min len: 63 avg len: 78.0 num_loss_counted_tokens: 117 | |
total tokens: 138 num samples: 2 num padding tokens: 25 - rank: 0 max len: 69 min len: 44 avg len: 56.5 num_loss_counted_tokens: 69 | |
total tokens: 118 num samples: 2 num padding tokens: 6 - rank: 0 max len: 59 min len: 53 avg len: 56.0 num_loss_counted_tokens: 50 | |
total tokens: 148 num samples: 2 num padding tokens: 8 - rank: 0 max len: 74 min len: 66 avg len: 70.0 num_loss_counted_tokens: 76 | |
total tokens: 96 num samples: 2 num padding tokens: 5 - rank: 0 max len: 48 min len: 43 avg len: 45.5 num_loss_counted_tokens: 39 | |
total tokens: 166 num samples: 2 num padding tokens: 3 - rank: 3 max len: 83 min len: 80 avg len: 81.5 num_loss_counted_tokens: 111 | |
total tokens: 164 num samples: 2 num padding tokens: 29 - rank: 0 max len: 82 min len: 53 avg len: 67.5 num_loss_counted_tokens: 85 | |
total tokens: 128 num samples: 2 num padding tokens: 1 - rank: 1 max len: 64 min len: 63 avg len: 63.5 num_loss_counted_tokens: 68 | |
total tokens: 128 num samples: 2 num padding tokens: 7 - rank: 3 max len: 64 min len: 57 avg len: 60.5 num_loss_counted_tokens: 66 | |
total tokens: 140 num samples: 2 num padding tokens: 10 - rank: 3 max len: 70 min len: 60 avg len: 65.0 num_loss_counted_tokens: 73 | |
total tokens: 116 num samples: 2 num padding tokens: 9 - rank: 4 max len: 58 min len: 49 avg len: 53.5 num_loss_counted_tokens: 57 | |
total tokens: 126 num samples: 2 num padding tokens: 13 - rank: 3 max len: 63 min len: 50 avg len: 56.5 num_loss_counted_tokens: 61 | |
total tokens: 122 num samples: 2 num padding tokens: 2 - rank: 7 max len: 61 min len: 59 avg len: 60.0 num_loss_counted_tokens: 61 | |
total tokens: 282 num samples: 2 num padding tokens: 70 - rank: 7 max len: 141 min len: 71 avg len: 106.0 num_loss_counted_tokens: 151 | |
total tokens: 132 num samples: 2 num padding tokens: 15 - rank: 7 max len: 66 min len: 51 avg len: 58.5 num_loss_counted_tokens: 61 | |
total tokens: 186 num samples: 2 num padding tokens: 41 - rank: 1 max len: 93 min len: 52 avg len: 72.5 num_loss_counted_tokens: 99 | |
total tokens: 188 num samples: 2 num padding tokens: 8 - rank: 7 max len: 94 min len: 86 avg len: 90.0 num_loss_counted_tokens: 85 | |
total tokens: 168 num samples: 2 num padding tokens: 39 - rank: 7 max len: 84 min len: 45 avg len: 64.5 num_loss_counted_tokens: 71 | |
total tokens: 174 num samples: 2 num padding tokens: 17 - rank: 7 max len: 87 min len: 70 avg len: 78.5 num_loss_counted_tokens: 88 | |
total tokens: 126 num samples: 2 num padding tokens: 2 - rank: 1 max len: 63 min len: 61 avg len: 62.0 num_loss_counted_tokens: 56 | |
total tokens: 146 num samples: 2 num padding tokens: 11 - rank: 1 max len: 73 min len: 62 avg len: 67.5 num_loss_counted_tokens: 75 | |
total tokens: 208 num samples: 2 num padding tokens: 38 - rank: 0 max len: 104 min len: 66 avg len: 85.0 num_loss_counted_tokens: 110 | |
total tokens: 132 num samples: 2 num padding tokens: 16 - rank: 4 max len: 66 min len: 50 avg len: 58.0 num_loss_counted_tokens: 61 | |
total tokens: 172 num samples: 2 num padding tokens: 28 - rank: 1 max len: 86 min len: 58 avg len: 72.0 num_loss_counted_tokens: 78 | |
total tokens: 184 num samples: 2 num padding tokens: 37 - rank: 1 max len: 92 min len: 55 avg len: 73.5 num_loss_counted_tokens: 89 | |
total tokens: 226 num samples: 2 num padding tokens: 39 - rank: 1 max len: 113 min len: 74 avg len: 93.5 num_loss_counted_tokens: 109 | |
total tokens: 134 num samples: 2 num padding tokens: 23 - rank: 1 max len: 67 min len: 44 avg len: 55.5 num_loss_counted_tokens: 47 | |
total tokens: 126 num samples: 2 num padding tokens: 8 - rank: 1 max len: 63 min len: 55 avg len: 59.0 num_loss_counted_tokens: 62 total tokens: 162 num samples: 2 num padding tokens: 12 - rank: 1 max len: 81 min len: 69 avg len: 75.0 num_loss_counted_tokens: 71 | |
total tokens: 158 num samples: 2 num padding tokens: 12 - rank: 1 max len: 79 min len: 67 avg len: 73.0 num_loss_counted_tokens: 66 | |
total tokens: 172 num samples: 2 num padding tokens: 27 - rank: 6 max len: 86 min len: 59 avg len: 72.5 num_loss_counted_tokens: 76 | |
total tokens: 128 num samples: 2 num padding tokens: 3 - rank: 6 max len: 64 min len: 61 avg len: 62.5 num_loss_counted_tokens: 63 | |
total tokens: 98 num samples: 2 num padding tokens: 4 - rank: 6 max len: 49 min len: 45 avg len: 47.0 num_loss_counted_tokens: 48 | |
total tokens: 158 num samples: 2 num padding tokens: 24 - rank: 6 max len: 79 min len: 55 avg len: 67.0 num_loss_counted_tokens: 76 | |
total tokens: 124 num samples: 2 num padding tokens: 1 - rank: 3 max len: 62 min len: 61 avg len: 61.5 num_loss_counted_tokens: 57 | |
total tokens: 228 num samples: 2 num padding tokens: 46 - rank: 6 max len: 114 min len: 68 avg len: 91.0 num_loss_counted_tokens: 118 | |
total tokens: 122 num samples: 2 num padding tokens: 6 - rank: 6 max len: 61 min len: 55 avg len: 58.0 num_loss_counted_tokens: 68 | |
total tokens: 216 num samples: 2 num padding tokens: 60 - rank: 6 max len: 108 min len: 48 avg len: 78.0 num_loss_counted_tokens: 102 | |
total tokens: 126 num samples: 2 num padding tokens: 0 - rank: 6 max len: 63 min len: 63 avg len: 63.0 num_loss_counted_tokens: 63 | |
total tokens: 120 num samples: 2 num padding tokens: 14 - rank: 6 max len: 60 min len: 46 avg len: 53.0 num_loss_counted_tokens: 57 | |
total tokens: 146 num samples: 2 num padding tokens: 1 - rank: 6 max len: 73 min len: 72 avg len: 72.5 num_loss_counted_tokens: 91 | |
total tokens: 124 num samples: 2 num padding tokens: 11 - rank: 6 max len: 62 min len: 51 avg len: 56.5 num_loss_counted_tokens: 57 | |
total tokens: 134 num samples: 2 num padding tokens: 15 - rank: 1 max len: 67 min len: 52 avg len: 59.5 num_loss_counted_tokens: 66 | |
total tokens: 116 num samples: 2 num padding tokens: 8 - rank: 6 max len: 58 min len: 50 avg len: 54.0 num_loss_counted_tokens: 59 | |
Per-token loss scaled by world size: 0.0013802563771605492Per-token loss scaled by world size: 0.0019055134616792202 | |
Per-token loss scaled by world size: 0.003680554451420903Per-token loss scaled by world size: 0.001587073435075581Per-token loss scaled by world size: 0.0016849382082000375 | |
Per-token loss scaled by world size: 0.00304134888574481 | |
Per-token loss scaled by world size: 0.00129329867195338 | |
Epoch: 1, Step: 13, Rank: 1, loss = 0.1150788739323616 | |
Epoch: 1, Step: 13, Rank: 3, loss = 0.15887218713760376 | |
Epoch: 1, Step: 13, Rank: 7, loss = 0.30686622858047485 | |
Epoch: 1, Step: 13, Rank: 4, loss = 0.1323222517967224 | |
Epoch: 1, Step: 13, Rank: 0, loss = 0.2535724639892578 | |
Epoch: 1, Step: 13, Rank: 2, loss = 0.14048172533512115 | |
Epoch: 1, Step: 13, Rank: 5, loss = 0.1078287735581398 | |
Per-token loss scaled by world size: 0.000824308895971626 | |
Epoch: 1, Step: 13, Rank: 6, loss = 0.06872675567865372 | |
[2024-07-27 20:04:28,502] [INFO] [logging.py:96:log_dist] [Rank 0] step=13, skipped=0, lr=[1.04e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:28,578] [INFO] [timer.py:258:stop] epoch=0/micro_step=13/global_step=13, RunningAvgSamplesPerSec=31.312819818036353, CurrSamplesPerSec=28.289847056874475, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1, | 1/12 [00:00<00:10, 1.05it/s] | |
"step": 13, | |
"rank": 0, | |
"loss": 0.2535724639892578, | |
"overall_throughput": 28.191588421887058, | |
"lr": 1.04e-05, | |
"cuda_mem_allocated": 22.006441116333008, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 667, | |
"batch_size": 16, | |
"total_loss": 0.16046865284442902, | |
"gradnorm": 3.25748348236084, | |
"weight_norm": 393.4556884765625, | |
"timestamp": "2024-07-27T20:04:28.620919" | |
} | |
Per-token loss scaled by world size: 0.003146873554214835Per-token loss scaled by world size: 0.0015134336426854134Per-token loss scaled by world size: 0.0021054281387478113Per-token loss scaled by world size: 0.005117365624755621 | |
Per-token loss scaled by world size: 0.0010033146245405078Per-token loss scaled by world size: 0.0036201237235218287 | |
Per-token loss scaled by world size: 0.003257090924307704 | |
Epoch: 1, Step: 14, Rank: 0, loss = 0.2533233165740967 | |
Epoch: 1, Step: 14, Rank: 5, loss = 0.41194793581962585Epoch: 1, Step: 14, Rank: 2, loss = 0.12183140963315964 | |
Epoch: 1, Step: 14, Rank: 6, loss = 0.16948696970939636 | |
Epoch: 1, Step: 14, Rank: 1, loss = 0.29141995310783386 | |
Epoch: 1, Step: 14, Rank: 3, loss = 0.2621958255767822 | |
Epoch: 1, Step: 14, Rank: 7, loss = 0.08076682686805725 | |
Per-token loss scaled by world size: 0.0011276104487478733 | |
Epoch: 1, Step: 14, Rank: 4, loss = 0.09077264368534088 | |
[2024-07-27 20:04:29,046] [INFO] [logging.py:96:log_dist] [Rank 0] step=14, skipped=0, lr=[1.1200000000000001e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:29,123] [INFO] [timer.py:258:stop] epoch=0/micro_step=14/global_step=14, RunningAvgSamplesPerSec=31.374398887750708, CurrSamplesPerSec=32.06810729498475, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1,▋ | 2/12 [00:01<00:07, 1.40it/s] | |
"step": 14, | |
"rank": 0, | |
"loss": 0.2533233165740967, | |
"overall_throughput": 32.01091375509972, | |
"lr": 1.1200000000000001e-05, | |
"cuda_mem_allocated": 22.00023889541626, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 644, | |
"batch_size": 16, | |
"total_loss": 0.21021811664104462, | |
"gradnorm": 6.8222336769104, | |
"weight_norm": 393.4558410644531, | |
"timestamp": "2024-07-27T20:04:29.126718" | |
} | |
Per-token loss scaled by world size: 0.00125453295186162Per-token loss scaled by world size: 0.002432552631944418Per-token loss scaled by world size: 0.0022791901137679815Per-token loss scaled by world size: 0.0012238719500601292 | |
Per-token loss scaled by world size: 0.0040193116292357445 | |
Per-token loss scaled by world size: 0.002601771615445614Per-token loss scaled by world size: 0.0017355632735416293 | |
Epoch: 1, Step: 15, Rank: 0, loss = 0.08985592424869537 | |
Epoch: 1, Step: 15, Rank: 1, loss = 0.17423158884048462 | |
Epoch: 1, Step: 15, Rank: 2, loss = 0.1632469892501831 | |
Epoch: 1, Step: 15, Rank: 6, loss = 0.08765982836484909Epoch: 1, Step: 15, Rank: 4, loss = 0.2878831923007965 | |
Epoch: 1, Step: 15, Rank: 3, loss = 0.18635189533233643 | |
Epoch: 1, Step: 15, Rank: 5, loss = 0.1243097186088562 | |
Per-token loss scaled by world size: 0.0024993098340928555 | |
Epoch: 1, Step: 15, Rank: 7, loss = 0.17901305854320526 | |
[2024-07-27 20:04:29,597] [INFO] [logging.py:96:log_dist] [Rank 0] step=15, skipped=0, lr=[1.2e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:29,674] [INFO] [timer.py:258:stop] epoch=0/micro_step=15/global_step=15, RunningAvgSamplesPerSec=31.399353099680013, CurrSamplesPerSec=31.70192973588364, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1,█▌ | 3/12 [00:02<00:05, 1.56it/s] | |
"step": 15, | |
"rank": 0, | |
"loss": 0.08985592424869537, | |
"overall_throughput": 31.64709333197519, | |
"lr": 1.2e-05, | |
"cuda_mem_allocated": 21.999523639678955, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 573, | |
"batch_size": 16, | |
"total_loss": 0.1615689992904663, | |
"gradnorm": 3.001760244369507, | |
"weight_norm": 393.45599365234375, | |
"timestamp": "2024-07-27T20:04:29.719105" | |
} | |
Per-token loss scaled by world size: 0.0014579611597582698Per-token loss scaled by world size: 0.003502971027046442Per-token loss scaled by world size: 0.0018768769223242998 | |
Per-token loss scaled by world size: 0.0020246703643351793Per-token loss scaled by world size: 0.001514959498308599Per-token loss scaled by world size: 0.006437234580516815 | |
Epoch: 1, Step: 16, Rank: 0, loss = 0.10005258768796921Epoch: 1, Step: 16, Rank: 4, loss = 0.1288006752729416Epoch: 1, Step: 16, Rank: 2, loss = 0.24039138853549957 | |
Epoch: 1, Step: 16, Rank: 7, loss = 0.4417552351951599 | |
Epoch: 1, Step: 16, Rank: 1, loss = 0.13894300162792206Epoch: 1, Step: 16, Rank: 3, loss = 0.10396409779787064 | |
Per-token loss scaled by world size: 0.002007455099374056 | |
Per-token loss scaled by world size: 0.0018041662406176329 | |
Epoch: 1, Step: 16, Rank: 5, loss = 0.13776160776615143 | |
Epoch: 1, Step: 16, Rank: 6, loss = 0.12381090968847275 | |
[2024-07-27 20:04:30,141] [INFO] [logging.py:96:log_dist] [Rank 0] step=16, skipped=0, lr=[1.2800000000000001e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:30,218] [INFO] [timer.py:258:stop] epoch=0/micro_step=16/global_step=16, RunningAvgSamplesPerSec=31.464876520188444, CurrSamplesPerSec=32.342260256708684, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1,██▎ | 4/12 [00:02<00:04, 1.66it/s] | |
"step": 16, | |
"rank": 0, | |
"loss": 0.10005258768796921, | |
"overall_throughput": 32.288201736596754, | |
"lr": 1.2800000000000001e-05, | |
"cuda_mem_allocated": 22.003100872039795, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 549, | |
"batch_size": 16, | |
"total_loss": 0.17693495750427246, | |
"gradnorm": 2.5745155811309814, | |
"weight_norm": 393.4561462402344, | |
"timestamp": "2024-07-27T20:04:30.261395" | |
} | |
Per-token loss scaled by world size: 0.004123833030462265Per-token loss scaled by world size: 0.002096434822306037Per-token loss scaled by world size: 0.002511914586648345Per-token loss scaled by world size: 0.004808654077351093 | |
Per-token loss scaled by world size: 0.0011069930624216795Per-token loss scaled by world size: 0.002304441062733531 | |
Epoch: 1, Step: 17, Rank: 1, loss = 0.20597699284553528 | |
Epoch: 1, Step: 17, Rank: 7, loss = 0.17190766334533691 | |
Epoch: 1, Step: 17, Rank: 3, loss = 0.3943096399307251Epoch: 1, Step: 17, Rank: 6, loss = 0.33815431594848633 | |
Epoch: 1, Step: 17, Rank: 2, loss = 0.09077343344688416 | |
Epoch: 1, Step: 17, Rank: 4, loss = 0.18896417319774628 | |
Per-token loss scaled by world size: 0.0022304877638816833 | |
Per-token loss scaled by world size: 0.0029599058907479048 | |
Epoch: 1, Step: 17, Rank: 0, loss = 0.24271227419376373 | |
Epoch: 1, Step: 17, Rank: 5, loss = 0.18289999663829803 | |
[2024-07-27 20:04:30,680] [INFO] [logging.py:96:log_dist] [Rank 0] step=17, skipped=0, lr=[1.3600000000000002e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:30,758] [INFO] [timer.py:258:stop] epoch=0/micro_step=17/global_step=17, RunningAvgSamplesPerSec=31.51917479452854, CurrSamplesPerSec=32.29951508996705, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1,███▏ | 5/12 [00:03<00:04, 1.73it/s] | |
"step": 17, | |
"rank": 0, | |
"loss": 0.24271227419376373, | |
"overall_throughput": 32.21476489448058, | |
"lr": 1.3600000000000002e-05, | |
"cuda_mem_allocated": 21.997375965118408, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 656, | |
"batch_size": 16, | |
"total_loss": 0.22696229815483093, | |
"gradnorm": 4.573685169219971, | |
"weight_norm": 393.4563293457031, | |
"timestamp": "2024-07-27T20:04:30.804653" | |
} | |
Per-token loss scaled by world size: 0.0030115304980427027Per-token loss scaled by world size: 0.006417228374630213Per-token loss scaled by world size: 0.007109665311872959Per-token loss scaled by world size: 0.000538784428499639Per-token loss scaled by world size: 0.002789800288155675Per-token loss scaled by world size: 0.003157705068588257 | |
Per-token loss scaled by world size: 0.0017401942750439048 | |
Epoch: 1, Step: 18, Rank: 0, loss = 0.23715803027153015Epoch: 1, Step: 18, Rank: 7, loss = 0.5053567290306091Epoch: 1, Step: 18, Rank: 6, loss = 0.5598861575126648 | |
Epoch: 1, Step: 18, Rank: 2, loss = 0.2196967750787735Epoch: 1, Step: 18, Rank: 1, loss = 0.042429275810718536 | |
Epoch: 1, Step: 18, Rank: 5, loss = 0.24866926670074463 | |
Epoch: 1, Step: 18, Rank: 3, loss = 0.13704030215740204 | |
Per-token loss scaled by world size: 0.0006516931462101638 | |
Epoch: 1, Step: 18, Rank: 4, loss = 0.05132083594799042 | |
[2024-07-27 20:04:31,227] [INFO] [logging.py:96:log_dist] [Rank 0] step=18, skipped=0, lr=[1.4400000000000001e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:31,304] [INFO] [timer.py:258:stop] epoch=0/micro_step=18/global_step=18, RunningAvgSamplesPerSec=31.586850556537332, CurrSamplesPerSec=32.638021628709105, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1,████ | 6/12 [00:03<00:03, 1.76it/s] | |
"step": 18, | |
"rank": 0, | |
"loss": 0.23715803027153015, | |
"overall_throughput": 32.58399904834273, | |
"lr": 1.4400000000000001e-05, | |
"cuda_mem_allocated": 21.999762058258057, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 630, | |
"batch_size": 16, | |
"total_loss": 0.2501946687698364, | |
"gradnorm": 4.389204025268555, | |
"weight_norm": 393.45654296875, | |
"timestamp": "2024-07-27T20:04:31.348530" | |
} | |
Per-token loss scaled by world size: 0.003954596351832151Per-token loss scaled by world size: 0.0016637382796034217Per-token loss scaled by world size: 0.003214797005057335Per-token loss scaled by world size: 0.006215415894985199Per-token loss scaled by world size: 0.0025190163869410753Per-token loss scaled by world size: 0.0015009477501735091 | |
Per-token loss scaled by world size: 0.0017105289734899998 | |
Epoch: 1, Step: 19, Rank: 0, loss = 0.34849879145622253Epoch: 1, Step: 19, Rank: 7, loss = 0.22198832035064697 | |
Epoch: 1, Step: 19, Rank: 1, loss = 0.28330397605895996 | |
Epoch: 1, Step: 19, Rank: 2, loss = 0.14661693572998047Epoch: 1, Step: 19, Rank: 5, loss = 0.5477335453033447 | |
Epoch: 1, Step: 19, Rank: 3, loss = 0.1507403701543808 | |
Epoch: 1, Step: 19, Rank: 6, loss = 0.13227102160453796 | |
Per-token loss scaled by world size: 0.0012170596746727824 | |
Epoch: 1, Step: 19, Rank: 4, loss = 0.10725338757038116 | |
[2024-07-27 20:04:31,783] [INFO] [logging.py:96:log_dist] [Rank 0] step=19, skipped=0, lr=[1.5200000000000002e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:31,860] [INFO] [timer.py:258:stop] epoch=0/micro_step=19/global_step=19, RunningAvgSamplesPerSec=31.571700427209116, CurrSamplesPerSec=31.331259798479305, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1,████▊ | 7/12 [00:04<00:02, 1.77it/s] | |
"step": 19, | |
"rank": 0, | |
"loss": 0.34849879145622253, | |
"overall_throughput": 31.250591517446562, | |
"lr": 1.5200000000000002e-05, | |
"cuda_mem_allocated": 22.002862453460693, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 705, | |
"batch_size": 16, | |
"total_loss": 0.24230077862739563, | |
"gradnorm": 3.7223894596099854, | |
"weight_norm": 393.456787109375, | |
"timestamp": "2024-07-27T20:04:31.903580" | |
} | |
Per-token loss scaled by world size: 0.003282753750681877Per-token loss scaled by world size: 0.006474556401371956Per-token loss scaled by world size: 0.001697456929832697Per-token loss scaled by world size: 0.0012144312495365739Per-token loss scaled by world size: 0.0008425723062828183Per-token loss scaled by world size: 0.0018245026003569365 | |
Per-token loss scaled by world size: 0.0037540853954851627 | |
Epoch: 1, Step: 20, Rank: 6, loss = 0.12476308643817902 | |
Epoch: 1, Step: 20, Rank: 2, loss = 0.4758799076080322 | |
Epoch: 1, Step: 20, Rank: 5, loss = 0.061929065734148026 | |
Epoch: 1, Step: 20, Rank: 4, loss = 0.2412824034690857Epoch: 1, Step: 20, Rank: 0, loss = 0.08926069736480713 | |
Epoch: 1, Step: 20, Rank: 7, loss = 0.27592527866363525Epoch: 1, Step: 20, Rank: 1, loss = 0.13410094380378723 | |
Per-token loss scaled by world size: 0.0009085916099138558 | |
Epoch: 1, Step: 20, Rank: 3, loss = 0.06678148359060287 | |
[2024-07-27 20:04:32,321] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=0, lr=[1.6000000000000003e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:32,399] [INFO] [timer.py:258:stop] epoch=0/micro_step=20/global_step=20, RunningAvgSamplesPerSec=31.62285860527634, CurrSamplesPerSec=32.51863226575504, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1,█████▋ | 8/12 [00:04<00:02, 1.80it/s] | |
"step": 20, | |
"rank": 0, | |
"loss": 0.08926069736480713, | |
"overall_throughput": 32.4628994682786, | |
"lr": 1.6000000000000003e-05, | |
"cuda_mem_allocated": 22.00811004638672, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 588, | |
"batch_size": 16, | |
"total_loss": 0.18374036252498627, | |
"gradnorm": 4.014389514923096, | |
"weight_norm": 393.4570617675781, | |
"timestamp": "2024-07-27T20:04:32.401998" | |
} | |
Per-token loss scaled by world size: 0.0046243746764957905 | |
Per-token loss scaled by world size: 0.0019434256246313453Per-token loss scaled by world size: 0.0029365788213908672Per-token loss scaled by world size: 0.0035864808596670628Per-token loss scaled by world size: 0.003000351833179593Per-token loss scaled by world size: 0.002845433074980974Per-token loss scaled by world size: 0.0026900055818259716 | |
Epoch: 1, Step: 21, Rank: 0, loss = 0.33989155292510986 | |
Epoch: 1, Step: 21, Rank: 3, loss = 0.14284178614616394Epoch: 1, Step: 21, Rank: 2, loss = 0.220525860786438Epoch: 1, Step: 21, Rank: 5, loss = 0.26360633969306946 | |
Epoch: 1, Step: 21, Rank: 6, loss = 0.21583855152130127 | |
Epoch: 1, Step: 21, Rank: 7, loss = 0.1977154165506363 | |
Epoch: 1, Step: 21, Rank: 4, loss = 0.20913933217525482 | |
Per-token loss scaled by world size: 0.003519931575283408 | |
Epoch: 1, Step: 21, Rank: 1, loss = 0.2587149739265442 | |
[2024-07-27 20:04:32,866] [INFO] [logging.py:96:log_dist] [Rank 0] step=21, skipped=0, lr=[1.6800000000000002e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:32,944] [INFO] [timer.py:258:stop] epoch=0/micro_step=21/global_step=21, RunningAvgSamplesPerSec=31.62433633690537, CurrSamplesPerSec=31.650959142641135, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1,██████▌ | 9/12 [00:05<00:01, 1.81it/s] | |
"step": 21, | |
"rank": 0, | |
"loss": 0.33989155292510986, | |
"overall_throughput": 31.58514555251878, | |
"lr": 1.6800000000000002e-05, | |
"cuda_mem_allocated": 21.998091220855713, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 588, | |
"batch_size": 16, | |
"total_loss": 0.23103423416614532, | |
"gradnorm": 4.073541164398193, | |
"weight_norm": 393.4573669433594, | |
"timestamp": "2024-07-27T20:04:32.946992" | |
} | |
Per-token loss scaled by world size: 0.0026941639371216297Per-token loss scaled by world size: 0.007396150380373001Per-token loss scaled by world size: 0.0018774037016555667 | |
Per-token loss scaled by world size: 0.0010539990616962314 | |
Per-token loss scaled by world size: 0.0033142913598567247Per-token loss scaled by world size: 0.0031690315809100866 | |
Per-token loss scaled by world size: 0.00544370012357831 | |
Epoch: 1, Step: 22, Rank: 2, loss = 0.527900218963623Epoch: 1, Step: 22, Rank: 6, loss = 0.13399969041347504 | |
Epoch: 1, Step: 22, Rank: 5, loss = 0.07522918283939362 | |
Epoch: 1, Step: 22, Rank: 1, loss = 0.19229595363140106 | |
Epoch: 1, Step: 22, Rank: 4, loss = 0.23655754327774048 | |
Epoch: 1, Step: 22, Rank: 7, loss = 0.22618962824344635 | |
Epoch: 1, Step: 22, Rank: 3, loss = 0.38854408264160156 | |
Per-token loss scaled by world size: 0.0007584551349282265 | |
Epoch: 1, Step: 22, Rank: 0, loss = 0.054134733974933624 | |
[2024-07-27 20:04:33,400] [INFO] [logging.py:96:log_dist] [Rank 0] step=22, skipped=0, lr=[1.76e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:33,478] [INFO] [timer.py:258:stop] epoch=0/micro_step=22/global_step=22, RunningAvgSamplesPerSec=31.68561296315261, CurrSamplesPerSec=32.896711596691546, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Saving model in huggingface format at samples_seen: 352 | |
{ | |
"epoch": 1, | |
"step": 22, | |
"rank": 0, | |
"loss": 0.054134733974933624, | |
"overall_throughput": 32.789902630272564, | |
"lr": 1.76e-05, | |
"cuda_mem_allocated": 21.997375965118408, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 571, | |
"batch_size": 16, | |
"total_loss": 0.22935637831687927, | |
"gradnorm": 3.735788106918335, | |
"weight_norm": 393.4576721191406, | |
"timestamp": "2024-07-27T20:04:33.482264" | |
} | |
Model saved in /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_352 | |
[20:04:51] INFO saving took 17.896338939666748 seconds utils.py:611 | |
Per-token loss scaled by world size: 0.001945212366990745Per-token loss scaled by world size: 0.0022036610171198845Per-token loss scaled by world size: 0.0031597877386957407Per-token loss scaled by world size: 0.0022363392636179924Per-token loss scaled by world size: 0.017282620072364807 | |
Per-token loss scaled by world size: 0.0026094394270330667Per-token loss scaled by world size: 0.0019479849142953753 | |
Epoch: 1, Step: 23, Rank: 1, loss = 0.1749155968427658Epoch: 1, Step: 23, Rank: 5, loss = 0.25080814957618713Epoch: 1, Step: 23, Rank: 4, loss = 0.17750942707061768 | |
Epoch: 1, Step: 23, Rank: 3, loss = 0.207124263048172 | |
Epoch: 1, Step: 23, Rank: 7, loss = 1.3718079328536987 | |
Epoch: 1, Step: 23, Rank: 0, loss = 0.15440122783184052 | |
Epoch: 1, Step: 23, Rank: 2, loss = 0.15462130308151245 | |
Per-token loss scaled by world size: 0.004577424377202988 | |
Epoch: 1, Step: 23, Rank: 6, loss = 0.3633330762386322 | |
[2024-07-27 20:04:51,866] [INFO] [logging.py:96:log_dist] [Rank 0] step=23, skipped=0, lr=[1.8400000000000003e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:51,943] [INFO] [timer.py:258:stop] epoch=0/micro_step=23/global_step=23, RunningAvgSamplesPerSec=31.661565651394987, CurrSamplesPerSec=31.188169951680987, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1,████████▏| 11/12 [00:24<00:04, 4.39s/it] | |
"step": 23, | |
"rank": 0, | |
"loss": 0.15440122783184052, | |
"overall_throughput": 31.12553529081371, | |
"lr": 1.8400000000000003e-05, | |
"cuda_mem_allocated": 22.000954627990723, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 635, | |
"batch_size": 16, | |
"total_loss": 0.3568150997161865, | |
"gradnorm": 3.5464768409729004, | |
"weight_norm": 393.4579772949219, | |
"timestamp": "2024-07-27T20:04:51.986553" | |
} | |
Per-token loss scaled by world size: 0.002529617166146636Per-token loss scaled by world size: 0.004327333997935057Per-token loss scaled by world size: 0.002556184073910117Per-token loss scaled by world size: 0.005085375625640154Per-token loss scaled by world size: 0.008069510571658611Per-token loss scaled by world size: 0.002654892858117819 | |
Per-token loss scaled by world size: 0.001250272849574685 | |
Epoch: 1, Step: 24, Rank: 3, loss = 0.3591546416282654 | |
Epoch: 1, Step: 24, Rank: 6, loss = 0.17865420877933502Epoch: 1, Step: 24, Rank: 5, loss = 0.18750180304050446 | |
Epoch: 1, Step: 24, Rank: 1, loss = 0.30561795830726624 | |
Epoch: 1, Step: 24, Rank: 4, loss = 0.5699091553688049Epoch: 1, Step: 24, Rank: 0, loss = 0.18053050339221954 | |
Epoch: 1, Step: 24, Rank: 2, loss = 0.08830051869153976 | |
Per-token loss scaled by world size: 0.0018133769044652581 | |
Epoch: 1, Step: 24, Rank: 7, loss = 0.12806974351406097 | |
[2024-07-27 20:04:52,418] [INFO] [logging.py:96:log_dist] [Rank 0] step=24, skipped=0, lr=[1.9200000000000003e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:52,496] [INFO] [timer.py:258:stop] epoch=0/micro_step=24/global_step=24, RunningAvgSamplesPerSec=31.65142326624155, CurrSamplesPerSec=31.439924179355366, MemAllocated=21.99GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 1,█████████| 12/12 [00:24<00:00, 3.22s/it] | |
"step": 24, | |
"rank": 0, | |
"loss": 0.18053050339221954, | |
"overall_throughput": 31.36277025201868, | |
"lr": 1.9200000000000003e-05, | |
"cuda_mem_allocated": 21.994752407073975, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 565, | |
"batch_size": 16, | |
"total_loss": 0.2497173249721527, | |
"gradnorm": 2.950968027114868, | |
"weight_norm": 393.4583435058594, | |
"timestamp": "2024-07-27T20:04:52.547761" | |
} | |
Epoch 1: 100%|██████████| 12/12 [00:24<00:00, 2.08s/it] | |
total tokens: 154 num samples: 2 num padding tokens: 26 - rank: 1 max len: 77 min len: 51 avg len: 64.0 num_loss_counted_tokens: 64 | |
total tokens: 102 num samples: 2 num padding tokens: 7 - rank: 1 max len: 51 min len: 44 avg len: 47.5 num_loss_counted_tokens: 51 | |
total tokens: 214 num samples: 2 num padding tokens: 37 - rank: 1 max len: 107 min len: 70 avg len: 88.5 num_loss_counted_tokens: 106 | |
total tokens: 166 num samples: 2 num padding tokens: 22 - rank: 1 max len: 83 min len: 61 avg len: 72.0 num_loss_counted_tokens: 86 | |
total tokens: 174 num samples: 2 num padding tokens: 24 - rank: 1 max len: 87 min len: 63 avg len: 75.0 num_loss_counted_tokens: 76 | |
total tokens: 138 num samples: 2 num padding tokens: 9 - rank: 1 max len: 69 min len: 60 avg len: 64.5 num_loss_counted_tokens: 58 | |
total tokens: 152 num samples: 2 num padding tokens: 8 - rank: 1 max len: 76 min len: 68 avg len: 72.0 num_loss_counted_tokens: 75 | |
total tokens: 120 num samples: 2 num padding tokens: 5 - rank: 1 max len: 60 min len: 55 avg len: 57.5 num_loss_counted_tokens: 64 | |
total tokens: 188 num samples: 2 num padding tokens: 27 - rank: 1 max len: 94 min len: 67 avg len: 80.5 num_loss_counted_tokens: 82 | |
total tokens: 154 num samples: 2 num padding tokens: 17 - rank: 1 max len: 77 min len: 60 avg len: 68.5 num_loss_counted_tokens: 88 | |
total tokens: 120 num samples: 2 num padding tokens: 16 - rank: 5 max len: 60 min len: 44 avg len: 52.0 num_loss_counted_tokens: 57 | |
total tokens: 118 num samples: 2 num padding tokens: 4 - rank: 1 max len: 59 min len: 55 avg len: 57.0 num_loss_counted_tokens: 62 | |
total tokens: 142 num samples: 2 num padding tokens: 5 - rank: 5 max len: 71 min len: 66 avg len: 68.5 num_loss_counted_tokens: 59 | |
total tokens: 120 num samples: 2 num padding tokens: 7 - rank: 5 max len: 60 min len: 53 avg len: 56.5 num_loss_counted_tokens: 54 | |
total tokens: 194 num samples: 2 num padding tokens: 36 - rank: 5 max len: 97 min len: 61 avg len: 79.0 num_loss_counted_tokens: 99 | |
total tokens: 132 num samples: 2 num padding tokens: 6 - rank: 0 max len: 66 min len: 60 avg len: 63.0 num_loss_counted_tokens: 63 total tokens: 120 num samples: 2 num padding tokens: 3 - rank: 5 max len: 60 min len: 57 avg len: 58.5 num_loss_counted_tokens: 80 | |
total tokens: 146 num samples: 2 num padding tokens: 18 - rank: 0 max len: 73 min len: 55 avg len: 64.0 num_loss_counted_tokens: 82 | |
total tokens: 130 num samples: 2 num padding tokens: 8 - rank: 0 max len: 65 min len: 57 avg len: 61.0 num_loss_counted_tokens: 60 | |
total tokens: 174 num samples: 2 num padding tokens: 29 - rank: 0 max len: 87 min len: 58 avg len: 72.5 num_loss_counted_tokens: 80 | |
total tokens: 172 num samples: 2 num padding tokens: 10 - rank: 5 max len: 86 min len: 76 avg len: 81.0 num_loss_counted_tokens: 84 | |
total tokens: 228 num samples: 2 num padding tokens: 44 - rank: 5 max len: 114 min len: 70 avg len: 92.0 num_loss_counted_tokens: 115 | |
total tokens: 200 num samples: 2 num padding tokens: 32 - rank: 0 max len: 100 min len: 68 avg len: 84.0 num_loss_counted_tokens: 95 | |
total tokens: 136 num samples: 2 num padding tokens: 4 - rank: 5 max len: 68 min len: 64 avg len: 66.0 num_loss_counted_tokens: 65 | |
total tokens: 168 num samples: 2 num padding tokens: 32 - rank: 5 max len: 84 min len: 52 avg len: 68.0 num_loss_counted_tokens: 80 | |
total tokens: 226 num samples: 2 num padding tokens: 64 - rank: 5 max len: 113 min len: 49 avg len: 81.0 num_loss_counted_tokens: 93 | |
total tokens: 134 num samples: 2 num padding tokens: 15 - rank: 0 max len: 67 min len: 52 avg len: 59.5 num_loss_counted_tokens: 60 | |
total tokens: 132 num samples: 2 num padding tokens: 11 - rank: 0 max len: 66 min len: 55 avg len: 60.5 num_loss_counted_tokens: 52 | |
total tokens: 172 num samples: 2 num padding tokens: 26 - rank: 0 max len: 86 min len: 60 avg len: 73.0 num_loss_counted_tokens: 78 | |
total tokens: 124 num samples: 2 num padding tokens: 12 - rank: 1 max len: 62 min len: 50 avg len: 56.0 num_loss_counted_tokens: 56 | |
total tokens: 226 num samples: 2 num padding tokens: 55 - rank: 4 max len: 113 min len: 58 avg len: 85.5 num_loss_counted_tokens: 102 total tokens: 132 num samples: 2 num padding tokens: 4 - rank: 0 max len: 66 min len: 62 avg len: 64.0 num_loss_counted_tokens: 74 | |
total tokens: 160 num samples: 2 num padding tokens: 31 - rank: 0 max len: 80 min len: 49 avg len: 64.5 num_loss_counted_tokens: 78 | |
total tokens: 174 num samples: 2 num padding tokens: 13 - rank: 0 max len: 87 min len: 74 avg len: 80.5 num_loss_counted_tokens: 90 | |
total tokens: 140 num samples: 2 num padding tokens: 18 - rank: 2 max len: 70 min len: 52 avg len: 61.0 num_loss_counted_tokens: 62 | |
total tokens: 188 num samples: 2 num padding tokens: 28 - rank: 4 max len: 94 min len: 66 avg len: 80.0 num_loss_counted_tokens: 99 | |
total tokens: 180 num samples: 2 num padding tokens: 28 - rank: 2 max len: 90 min len: 62 avg len: 76.0 num_loss_counted_tokens: 93 | |
total tokens: 120 num samples: 2 num padding tokens: 12 - rank: 5 max len: 60 min len: 48 avg len: 54.0 num_loss_counted_tokens: 58 | |
total tokens: 110 num samples: 2 num padding tokens: 7 - rank: 2 max len: 55 min len: 48 avg len: 51.5 num_loss_counted_tokens: 53 | |
total tokens: 142 num samples: 2 num padding tokens: 27 - rank: 6 max len: 71 min len: 44 avg len: 57.5 num_loss_counted_tokens: 66 | |
total tokens: 208 num samples: 2 num padding tokens: 39 - rank: 2 max len: 104 min len: 65 avg len: 84.5 num_loss_counted_tokens: 111 | |
total tokens: 166 num samples: 2 num padding tokens: 1 - rank: 5 max len: 83 min len: 82 avg len: 82.5 num_loss_counted_tokens: 96 | |
total tokens: 168 num samples: 2 num padding tokens: 20 - rank: 6 max len: 84 min len: 64 avg len: 74.0 num_loss_counted_tokens: 100 total tokens: 166 num samples: 2 num padding tokens: 33 - rank: 6 max len: 83 min len: 50 avg len: 66.5 num_loss_counted_tokens: 86 | |
total tokens: 134 num samples: 2 num padding tokens: 12 - rank: 2 max len: 67 min len: 55 avg len: 61.0 num_loss_counted_tokens: 67 | |
total tokens: 160 num samples: 2 num padding tokens: 16 - rank: 6 max len: 80 min len: 64 avg len: 72.0 num_loss_counted_tokens: 72 | |
total tokens: 180 num samples: 2 num padding tokens: 36 - rank: 6 max len: 90 min len: 54 avg len: 72.0 num_loss_counted_tokens: 95 | |
total tokens: 184 num samples: 2 num padding tokens: 26 - rank: 6 max len: 92 min len: 66 avg len: 79.0 num_loss_counted_tokens: 96 | |
total tokens: 150 num samples: 2 num padding tokens: 30 - rank: 2 max len: 75 min len: 45 avg len: 60.0 num_loss_counted_tokens: 65 | |
total tokens: 154 num samples: 2 num padding tokens: 23 - rank: 2 max len: 77 min len: 54 avg len: 65.5 num_loss_counted_tokens: 74 | |
total tokens: 152 num samples: 2 num padding tokens: 25 - rank: 2 max len: 76 min len: 51 avg len: 63.5 num_loss_counted_tokens: 70 | |
total tokens: 116 num samples: 2 num padding tokens: 12 - rank: 2 max len: 58 min len: 46 avg len: 52.0 num_loss_counted_tokens: 59 | |
total tokens: 126 num samples: 2 num padding tokens: 9 - rank: 6 max len: 63 min len: 54 avg len: 58.5 num_loss_counted_tokens: 65 | |
total tokens: 180 num samples: 2 num padding tokens: 19 - rank: 6 max len: 90 min len: 71 avg len: 80.5 num_loss_counted_tokens: 131 | |
total tokens: 142 num samples: 2 num padding tokens: 12 - rank: 2 max len: 71 min len: 59 avg len: 65.0 num_loss_counted_tokens: 66 | |
total tokens: 196 num samples: 2 num padding tokens: 29 - rank: 6 max len: 98 min len: 69 avg len: 83.5 num_loss_counted_tokens: 118 | |
total tokens: 120 num samples: 2 num padding tokens: 14 - rank: 6 max len: 60 min len: 46 avg len: 53.0 num_loss_counted_tokens: 59 | |
total tokens: 142 num samples: 2 num padding tokens: 5 - rank: 2 max len: 71 min len: 66 avg len: 68.5 num_loss_counted_tokens: 61 | |
total tokens: 126 num samples: 2 num padding tokens: 10 - rank: 7 max len: 63 min len: 53 avg len: 58.0 num_loss_counted_tokens: 56 total tokens: 122 num samples: 2 num padding tokens: 16 - rank: 7 max len: 61 min len: 45 avg len: 53.0 num_loss_counted_tokens: 53 | |
total tokens: 214 num samples: 2 num padding tokens: 55 - rank: 4 max len: 107 min len: 52 avg len: 79.5 num_loss_counted_tokens: 104 | |
total tokens: 118 num samples: 2 num padding tokens: 1 - rank: 7 max len: 59 min len: 58 avg len: 58.5 num_loss_counted_tokens: 72 | |
total tokens: 148 num samples: 2 num padding tokens: 10 - rank: 4 max len: 74 min len: 64 avg len: 69.0 num_loss_counted_tokens: 73 | |
total tokens: 244 num samples: 2 num padding tokens: 29 - rank: 4 max len: 122 min len: 93 avg len: 107.5 num_loss_counted_tokens: 144 | |
total tokens: 138 num samples: 2 num padding tokens: 9 - rank: 7 max len: 69 min len: 60 avg len: 64.5 num_loss_counted_tokens: 62 | |
total tokens: 110 num samples: 2 num padding tokens: 3 - rank: 3 max len: 55 min len: 52 avg len: 53.5 num_loss_counted_tokens: 53 total tokens: 152 num samples: 2 num padding tokens: 13 - rank: 3 max len: 76 min len: 63 avg len: 69.5 num_loss_counted_tokens: 71 | |
total tokens: 158 num samples: 2 num padding tokens: 7 - rank: 7 max len: 79 min len: 72 avg len: 75.5 num_loss_counted_tokens: 91 | |
total tokens: 116 num samples: 2 num padding tokens: 15 - rank: 0 max len: 58 min len: 43 avg len: 50.5 num_loss_counted_tokens: 49 | |
total tokens: 282 num samples: 2 num padding tokens: 60 - rank: 3 max len: 141 min len: 81 avg len: 111.0 num_loss_counted_tokens: 169 | |
total tokens: 124 num samples: 2 num padding tokens: 7 - rank: 3 max len: 62 min len: 55 avg len: 58.5 num_loss_counted_tokens: 62 | |
total tokens: 134 num samples: 2 num padding tokens: 8 - rank: 4 max len: 67 min len: 59 avg len: 63.0 num_loss_counted_tokens: 56 | |
total tokens: 216 num samples: 2 num padding tokens: 27 - rank: 4 max len: 108 min len: 81 avg len: 94.5 num_loss_counted_tokens: 123 | |
total tokens: 126 num samples: 2 num padding tokens: 14 - rank: 7 max len: 63 min len: 49 avg len: 56.0 num_loss_counted_tokens: 58 | |
total tokens: 156 num samples: 2 num padding tokens: 28 - rank: 3 max len: 78 min len: 50 avg len: 64.0 num_loss_counted_tokens: 70 | |
total tokens: 162 num samples: 2 num padding tokens: 31 - rank: 4 max len: 81 min len: 50 avg len: 65.5 num_loss_counted_tokens: 75 | |
total tokens: 144 num samples: 2 num padding tokens: 17 - rank: 4 max len: 72 min len: 55 avg len: 63.5 num_loss_counted_tokens: 68 | |
total tokens: 164 num samples: 2 num padding tokens: 20 - rank: 7 max len: 82 min len: 62 avg len: 72.0 num_loss_counted_tokens: 79 | |
total tokens: 128 num samples: 2 num padding tokens: 6 - rank: 2 max len: 64 min len: 58 avg len: 61.0 num_loss_counted_tokens: 55 total tokens: 128 num samples: 2 num padding tokens: 19 - rank: 3 max len: 64 min len: 45 avg len: 54.5 num_loss_counted_tokens: 63 | |
total tokens: 110 num samples: 2 num padding tokens: 7 - rank: 4 max len: 55 min len: 48 avg len: 51.5 num_loss_counted_tokens: 45 | |
total tokens: 176 num samples: 2 num padding tokens: 18 - rank: 6 max len: 88 min len: 70 avg len: 79.0 num_loss_counted_tokens: 90 | |
total tokens: 146 num samples: 2 num padding tokens: 21 - rank: 6 max len: 73 min len: 52 avg len: 62.5 num_loss_counted_tokens: 71 | |
total tokens: 118 num samples: 2 num padding tokens: 6 - rank: 7 max len: 59 min len: 53 avg len: 56.0 num_loss_counted_tokens: 57 | |
total tokens: 126 num samples: 2 num padding tokens: 1 - rank: 7 max len: 63 min len: 62 avg len: 62.5 num_loss_counted_tokens: 57 | |
total tokens: 116 num samples: 2 num padding tokens: 12 - rank: 4 max len: 58 min len: 46 avg len: 52.0 num_loss_counted_tokens: 54 | |
total tokens: 142 num samples: 2 num padding tokens: 26 - rank: 7 max len: 71 min len: 45 avg len: 58.0 num_loss_counted_tokens: 67 | |
total tokens: 146 num samples: 2 num padding tokens: 10 - rank: 4 max len: 73 min len: 63 avg len: 68.0 num_loss_counted_tokens: 79 | |
total tokens: 122 num samples: 2 num padding tokens: 9 - rank: 3 max len: 61 min len: 52 avg len: 56.5 num_loss_counted_tokens: 59 | |
total tokens: 140 num samples: 2 num padding tokens: 2 - rank: 3 max len: 70 min len: 68 avg len: 69.0 num_loss_counted_tokens: 66 | |
total tokens: 128 num samples: 2 num padding tokens: 4 - rank: 3 max len: 64 min len: 60 avg len: 62.0 num_loss_counted_tokens: 76 | |
total tokens: 122 num samples: 2 num padding tokens: 4 - rank: 3 max len: 61 min len: 57 avg len: 59.0 num_loss_counted_tokens: 64 | |
total tokens: 172 num samples: 2 num padding tokens: 16 - rank: 3 max len: 86 min len: 70 avg len: 78.0 num_loss_counted_tokens: 93 | |
total tokens: 186 num samples: 2 num padding tokens: 42 - rank: 7 max len: 93 min len: 51 avg len: 72.0 num_loss_counted_tokens: 94 | |
total tokens: 186 num samples: 2 num padding tokens: 14 - rank: 7 max len: 93 min len: 79 avg len: 86.0 num_loss_counted_tokens: 131 | |
total tokens: 202 num samples: 2 num padding tokens: 40 - rank: 3 max len: 101 min len: 61 avg len: 81.0 num_loss_counted_tokens: 104 | |
Per-token loss scaled by world size: 0.000771758146584034Per-token loss scaled by world size: 0.0032502268441021442Per-token loss scaled by world size: 0.001562815043143928Per-token loss scaled by world size: 0.004182006698101759 | |
Per-token loss scaled by world size: 0.0015922324964776635Per-token loss scaled by world size: 0.0030361246317625046 | |
Per-token loss scaled by world size: 0.0017774869920685887 | |
Epoch: 2, Step: 25, Rank: 3, loss = 0.05045368894934654 | |
Epoch: 2, Step: 25, Rank: 1, loss = 0.2733986973762512 | |
Epoch: 2, Step: 25, Rank: 4, loss = 0.21248358488082886 | |
Epoch: 2, Step: 25, Rank: 2, loss = 0.10216903686523438Epoch: 2, Step: 25, Rank: 0, loss = 0.19848664104938507 | |
Epoch: 2, Step: 25, Rank: 7, loss = 0.10409220308065414 | |
Epoch: 2, Step: 25, Rank: 5, loss = 0.11620321124792099 | |
Per-token loss scaled by world size: 0.0023637553676962852 | |
Epoch: 2, Step: 25, Rank: 6, loss = 0.15453051030635834 | |
[2024-07-27 20:04:53,438] [INFO] [logging.py:96:log_dist] [Rank 0] step=25, skipped=0, lr=[2e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:53,514] [INFO] [timer.py:258:stop] epoch=0/micro_step=25/global_step=25, RunningAvgSamplesPerSec=31.498630726223958, CurrSamplesPerSec=28.474580988875164, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 2: 8%|▊ | 1/12 [00:00<00:10, 1.09it/s]{ | |
"epoch": 2, | |
"step": 25, | |
"rank": 0, | |
"loss": 0.19848664104938507, | |
"overall_throughput": 28.3802930607719, | |
"lr": 2e-05, | |
"cuda_mem_allocated": 21.999046802520752, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 523, | |
"batch_size": 16, | |
"total_loss": 0.15147720277309418, | |
"gradnorm": 2.086557149887085, | |
"weight_norm": 393.458740234375, | |
"timestamp": "2024-07-27T20:04:53.559486" | |
} | |
Per-token loss scaled by world size: 0.004568464122712612Per-token loss scaled by world size: 0.000755587185267359Per-token loss scaled by world size: 0.0012551499530673027Per-token loss scaled by world size: 0.0036310378927737474Per-token loss scaled by world size: 0.0022255314979702234 | |
Per-token loss scaled by world size: 0.003098478075116873 | |
Per-token loss scaled by world size: 0.003910013008862734 | |
Epoch: 2, Step: 26, Rank: 1, loss = 0.057235728949308395 | |
Epoch: 2, Step: 26, Rank: 3, loss = 0.09507761150598526Epoch: 2, Step: 26, Rank: 0, loss = 0.3460611402988434Epoch: 2, Step: 26, Rank: 6, loss = 0.2750511169433594 | |
Epoch: 2, Step: 26, Rank: 2, loss = 0.2347097098827362 | |
Epoch: 2, Step: 26, Rank: 5, loss = 0.16858400404453278Epoch: 2, Step: 26, Rank: 4, loss = 0.2961834967136383 | |
Per-token loss scaled by world size: 0.000992890098132193 | |
Epoch: 2, Step: 26, Rank: 7, loss = 0.07521142810583115 | |
[2024-07-27 20:04:53,996] [INFO] [logging.py:96:log_dist] [Rank 0] step=26, skipped=0, lr=[1.999453257340926e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:54,074] [INFO] [timer.py:258:stop] epoch=0/micro_step=26/global_step=26, RunningAvgSamplesPerSec=31.501866831712768, CurrSamplesPerSec=31.576481216592637, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 2: 17%|█▋ | 2/12 [00:01<00:07, 1.42it/s]{ | |
"epoch": 2, | |
"step": 26, | |
"rank": 0, | |
"loss": 0.3460611402988434, | |
"overall_throughput": 31.521633384656862, | |
"lr": 1.999453257340926e-05, | |
"cuda_mem_allocated": 22.0040545463562, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 606, | |
"batch_size": 16, | |
"total_loss": 0.19351428747177124, | |
"gradnorm": 2.7967944145202637, | |
"weight_norm": 393.45916748046875, | |
"timestamp": "2024-07-27T20:04:54.115593" | |
} | |
Per-token loss scaled by world size: 0.001778947887942195Per-token loss scaled by world size: 0.0023961372207850218Per-token loss scaled by world size: 0.0019206402357667685Per-token loss scaled by world size: 0.0016144964611157775Per-token loss scaled by world size: 0.0014130653580650687 | |
Per-token loss scaled by world size: 0.0023006140254437923Per-token loss scaled by world size: 0.0029887459240853786 | |
Epoch: 2, Step: 27, Rank: 3, loss = 0.132994145154953Epoch: 2, Step: 27, Rank: 2, loss = 0.15821273624897003 | |
Epoch: 2, Step: 27, Rank: 5, loss = 0.19738179445266724 | |
Epoch: 2, Step: 27, Rank: 6, loss = 0.11640125513076782 | |
Epoch: 2, Step: 27, Rank: 1, loss = 0.1465408354997635Epoch: 2, Step: 27, Rank: 4, loss = 0.24619793891906738 | |
Epoch: 2, Step: 27, Rank: 0, loss = 0.18951308727264404 | |
Per-token loss scaled by world size: 0.0023933870252221823 | |
Epoch: 2, Step: 27, Rank: 7, loss = 0.19715525209903717 | |
[2024-07-27 20:04:54,557] [INFO] [logging.py:96:log_dist] [Rank 0] step=27, skipped=0, lr=[1.9978136272187745e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:54,634] [INFO] [timer.py:258:stop] epoch=0/micro_step=27/global_step=27, RunningAvgSamplesPerSec=31.47828132560415, CurrSamplesPerSec=30.922637265012085, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 2: 25%|██▌ | 3/12 [00:02<00:05, 1.56it/s]{ | |
"epoch": 2, | |
"step": 27, | |
"rank": 0, | |
"loss": 0.18951308727264404, | |
"overall_throughput": 30.84563042714866, | |
"lr": 1.9978136272187745e-05, | |
"cuda_mem_allocated": 22.00071620941162, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 659, | |
"batch_size": 16, | |
"total_loss": 0.17304962873458862, | |
"gradnorm": 2.572604179382324, | |
"weight_norm": 393.4596252441406, | |
"timestamp": "2024-07-27T20:04:54.678862" | |
} | |
Per-token loss scaled by world size: 0.0017004203982651234Per-token loss scaled by world size: 0.003374457824975252Per-token loss scaled by world size: 0.00338700320571661Per-token loss scaled by world size: 0.0010560491355136037Per-token loss scaled by world size: 0.0003863103629555553 | |
Per-token loss scaled by world size: 0.0015272889286279678Per-token loss scaled by world size: 0.00571776507422328 | |
Epoch: 2, Step: 28, Rank: 5, loss = 0.32557567954063416Epoch: 2, Step: 28, Rank: 6, loss = 0.10151272267103195Epoch: 2, Step: 28, Rank: 0, loss = 0.1634529083967209 | |
Epoch: 2, Step: 28, Rank: 4, loss = 0.32436975836753845 | |
Epoch: 2, Step: 28, Rank: 2, loss = 0.1468106508255005 | |
Epoch: 2, Step: 28, Rank: 3, loss = 0.5496201515197754 | |
Epoch: 2, Step: 28, Rank: 1, loss = 0.0371340848505497 | |
Per-token loss scaled by world size: 0.00041141020483337343 | |
Epoch: 2, Step: 28, Rank: 7, loss = 0.03954680636525154 | |
[2024-07-27 20:04:55,096] [INFO] [logging.py:96:log_dist] [Rank 0] step=28, skipped=0, lr=[1.9950829025450116e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:55,174] [INFO] [timer.py:258:stop] epoch=0/micro_step=28/global_step=28, RunningAvgSamplesPerSec=31.512036329377096, CurrSamplesPerSec=32.38008718791239, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 2: 33%|███▎ | 4/12 [00:02<00:04, 1.67it/s]{ | |
"epoch": 2, | |
"step": 28, | |
"rank": 0, | |
"loss": 0.1634529083967209, | |
"overall_throughput": 32.299561727331444, | |
"lr": 1.9950829025450116e-05, | |
"cuda_mem_allocated": 21.99880838394165, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 769, | |
"batch_size": 16, | |
"total_loss": 0.21100284159183502, | |
"gradnorm": 2.3949108123779297, | |
"weight_norm": 393.4600830078125, | |
"timestamp": "2024-07-27T20:04:55.217257" | |
} | |
Per-token loss scaled by world size: 0.0007814434356987476Per-token loss scaled by world size: 0.00027669736300595105Per-token loss scaled by world size: 0.0012405101442709565Per-token loss scaled by world size: 0.0030604854691773653Per-token loss scaled by world size: 0.0021558511070907116Per-token loss scaled by world size: 0.0016599269583821297Per-token loss scaled by world size: 0.0017815420869737864 | |
Epoch: 2, Step: 29, Rank: 3, loss = 0.023380927741527557 | |
Epoch: 2, Step: 29, Rank: 6, loss = 0.2586110234260559Epoch: 2, Step: 29, Rank: 1, loss = 0.10482310503721237 | |
Epoch: 2, Step: 29, Rank: 2, loss = 0.18216942250728607Epoch: 2, Step: 29, Rank: 4, loss = 0.06603197008371353 | |
Epoch: 2, Step: 29, Rank: 0, loss = 0.14026382565498352 | |
Epoch: 2, Step: 29, Rank: 5, loss = 0.1505403071641922 | |
Per-token loss scaled by world size: 0.004114873707294464 | |
Epoch: 2, Step: 29, Rank: 7, loss = 0.3477068245410919 | |
[2024-07-27 20:04:55,637] [INFO] [logging.py:96:log_dist] [Rank 0] step=29, skipped=0, lr=[1.9912640693269754e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:55,714] [INFO] [timer.py:258:stop] epoch=0/micro_step=29/global_step=29, RunningAvgSamplesPerSec=31.539610429600238, CurrSamplesPerSec=32.27386940956719, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
Epoch 2: 42%|████▏ | 5/12 [00:03<00:04, 1.73it/s]{ | |
"epoch": 2, | |
"step": 29, | |
"rank": 0, | |
"loss": 0.14026382565498352, | |
"overall_throughput": 32.193407411564976, | |
"lr": 1.9912640693269754e-05, | |
"cuda_mem_allocated": 22.007156372070312, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 676, | |
"batch_size": 16, | |
"total_loss": 0.15919092297554016, | |
"gradnorm": 2.4766182899475098, | |
"weight_norm": 393.4605407714844, | |
"timestamp": "2024-07-27T20:04:55.749943" | |
} | |
Per-token loss scaled by world size: 0.0029893971513956785Per-token loss scaled by world size: 0.005299379117786884Per-token loss scaled by world size: 0.0023671372327953577 | |
Per-token loss scaled by world size: 0.004149050917476416 | |
Per-token loss scaled by world size: 0.008750627748668194Per-token loss scaled by world size: 0.006499007809907198 | |
Epoch: 2, Step: 30, Rank: 0, loss = 0.1615571230649948 | |
Epoch: 2, Step: 30, Rank: 3, loss = 0.36168262362480164 | |
Epoch: 2, Step: 30, Rank: 6, loss = 0.20402635633945465 | |
Epoch: 2, Step: 30, Rank: 5, loss = 0.5972303748130798Per-token loss scaled by world size: 0.0007520572980865836 | |
Epoch: 2, Step: 30, Rank: 4, loss = 0.28317272663116455Epoch: 2, Step: 30, Rank: 7, loss = 0.4435572922229767 | |
Epoch: 2, Step: 30, Rank: 1, loss = 0.0513279102742672 | |
Per-token loss scaled by world size: 0.0032296415884047747 | |
Epoch: 2, Step: 30, Rank: 2, loss = 0.22042304277420044 | |
[2024-07-27 20:04:56,178] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=0, lr=[1.9863613034027224e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:56,256] [INFO] [timer.py:258:stop] epoch=0/micro_step=30/global_step=30, RunningAvgSamplesPerSec=31.5444892468786, CurrSamplesPerSec=31.676790257487433, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 2: 50%|█████ | 6/12 [00:03<00:03, 1.77it/s]{ | |
"epoch": 2, | |
"step": 30, | |
"rank": 0, | |
"loss": 0.1615571230649948, | |
"overall_throughput": 31.593056094983204, | |
"lr": 1.9863613034027224e-05, | |
"cuda_mem_allocated": 21.999285221099854, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 546, | |
"batch_size": 16, | |
"total_loss": 0.29037219285964966, | |
"gradnorm": 6.375105857849121, | |
"weight_norm": 393.4609375, | |
"timestamp": "2024-07-27T20:04:56.301457" | |
} | |
Per-token loss scaled by world size: 0.002896310528740287Per-token loss scaled by world size: 0.0031822924502193928Per-token loss scaled by world size: 0.0018208534456789494 | |
Per-token loss scaled by world size: 0.0022670035250484943 | |
Per-token loss scaled by world size: 0.008491733111441135 | |
Per-token loss scaled by world size: 0.003121417947113514 | |
Epoch: 2, Step: 31, Rank: 6, loss = 0.2474232316017151 | |
Epoch: 2, Step: 31, Rank: 2, loss = 0.14157135784626007 | |
Per-token loss scaled by world size: 0.0018668599659577012Epoch: 2, Step: 31, Rank: 1, loss = 0.22518813610076904 | |
Epoch: 2, Step: 31, Rank: 0, loss = 0.17625951766967773 | |
Epoch: 2, Step: 31, Rank: 7, loss = 0.6602322459220886 | |
Epoch: 2, Step: 31, Rank: 4, loss = 0.24269025027751923 | |
Epoch: 2, Step: 31, Rank: 5, loss = 0.145148366689682 | |
Per-token loss scaled by world size: 0.0017641705926507711 | |
Epoch: 2, Step: 31, Rank: 3, loss = 0.13716426491737366 | |
[2024-07-27 20:04:56,723] [INFO] [logging.py:96:log_dist] [Rank 0] step=31, skipped=0, lr=[1.9803799658748096e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:56,801] [INFO] [timer.py:258:stop] epoch=0/micro_step=31/global_step=31, RunningAvgSamplesPerSec=31.57118154258231, CurrSamplesPerSec=32.3373511160454, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 2: 58%|█████▊ | 7/12 [00:04<00:02, 1.79it/s]{ | |
"epoch": 2, | |
"step": 31, | |
"rank": 0, | |
"loss": 0.17625951766967773, | |
"overall_throughput": 32.279737447919125, | |
"lr": 1.9803799658748096e-05, | |
"cuda_mem_allocated": 22.0040545463562, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 622, | |
"batch_size": 16, | |
"total_loss": 0.24695967137813568, | |
"gradnorm": 4.441169261932373, | |
"weight_norm": 393.4613952636719, | |
"timestamp": "2024-07-27T20:04:56.844460" | |
} | |
Per-token loss scaled by world size: 0.0011196950217708945Per-token loss scaled by world size: 0.000792959937825799Per-token loss scaled by world size: 0.0029141369741410017Per-token loss scaled by world size: 0.004119256976991892Per-token loss scaled by world size: 0.0033213391434401274 | |
Per-token loss scaled by world size: 0.004044802393764257Per-token loss scaled by world size: 0.0030391488689929247 | |
Epoch: 2, Step: 32, Rank: 5, loss = 0.2269384115934372Epoch: 2, Step: 32, Rank: 2, loss = 0.06175175681710243 | |
Epoch: 2, Step: 32, Rank: 0, loss = 0.08719625324010849 | |
Epoch: 2, Step: 32, Rank: 3, loss = 0.2586492896080017Epoch: 2, Step: 32, Rank: 6, loss = 0.32078713178634644 | |
Epoch: 2, Step: 32, Rank: 4, loss = 0.23667371273040771 | |
Epoch: 2, Step: 32, Rank: 7, loss = 0.31498900055885315 | |
Per-token loss scaled by world size: 0.0011721396585926414 | |
Epoch: 2, Step: 32, Rank: 1, loss = 0.09128037840127945 | |
[2024-07-27 20:04:57,268] [INFO] [logging.py:96:log_dist] [Rank 0] step=32, skipped=0, lr=[1.973326597248006e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:57,345] [INFO] [timer.py:258:stop] epoch=0/micro_step=32/global_step=32, RunningAvgSamplesPerSec=31.5894998258649, CurrSamplesPerSec=32.130135235160566, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 2: 67%|██████▋ | 8/12 [00:04<00:02, 1.80it/s]{ | |
"epoch": 2, | |
"step": 32, | |
"rank": 0, | |
"loss": 0.08719625324010849, | |
"overall_throughput": 32.07842353700362, | |
"lr": 1.973326597248006e-05, | |
"cuda_mem_allocated": 21.997137546539307, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 623, | |
"batch_size": 16, | |
"total_loss": 0.19978323578834534, | |
"gradnorm": 2.9166290760040283, | |
"weight_norm": 393.46185302734375, | |
"timestamp": "2024-07-27T20:04:57.393136" | |
} | |
Per-token loss scaled by world size: 0.0031772709917277098Per-token loss scaled by world size: 0.002447927137836814Per-token loss scaled by world size: 0.0009199947817251086Per-token loss scaled by world size: 0.004487441387027502Per-token loss scaled by world size: 0.0024654706940054893 | |
Per-token loss scaled by world size: 0.00025754657690413296 | |
Per-token loss scaled by world size: 0.004304614849388599 | |
Epoch: 2, Step: 33, Rank: 4, loss = 0.06888461112976074 | |
Epoch: 2, Step: 33, Rank: 0, loss = 0.23789817094802856 | |
Epoch: 2, Step: 33, Rank: 5, loss = 0.33599716424942017Epoch: 2, Step: 33, Rank: 1, loss = 0.1832885444164276 | |
Epoch: 2, Step: 33, Rank: 2, loss = 0.18460211157798767 | |
Epoch: 2, Step: 33, Rank: 3, loss = 0.019283799454569817 | |
Epoch: 2, Step: 33, Rank: 6, loss = 0.3223080337047577 | |
Per-token loss scaled by world size: 0.0012406171299517155 | |
Epoch: 2, Step: 33, Rank: 7, loss = 0.09289120882749557 | |
[2024-07-27 20:04:57,818] [INFO] [logging.py:96:log_dist] [Rank 0] step=33, skipped=0, lr=[1.9652089102773487e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:04:57,896] [INFO] [timer.py:258:stop] epoch=0/micro_step=33/global_step=33, RunningAvgSamplesPerSec=31.59934389707566, CurrSamplesPerSec=31.897545876966834, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Saving model in huggingface format at samples_seen: 528 | |
{ | |
"epoch": 2, | |
"step": 33, | |
"rank": 0, | |
"loss": 0.23789817094802856, | |
"overall_throughput": 31.819460032023883, | |
"lr": 1.9652089102773487e-05, | |
"cuda_mem_allocated": 22.002385139465332, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 599, | |
"batch_size": 16, | |
"total_loss": 0.1806442141532898, | |
"gradnorm": 2.8334124088287354, | |
"weight_norm": 393.4622802734375, | |
"timestamp": "2024-07-27T20:04:57.899380" | |
} | |
Model saved in /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_528 | |
[20:05:15] INFO saving took 17.999591588974 seconds utils.py:611 | |
Epoch 2: 75%|███████▌ | 9/12 [00:23<00:18, 6.18s/it]Per-token loss scaled by world size: 0.0037983357906341553 | |
Per-token loss scaled by world size: 0.004671666771173477Per-token loss scaled by world size: 0.001915755565278232Per-token loss scaled by world size: 0.001806297223083675Per-token loss scaled by world size: 0.00687358109280467Per-token loss scaled by world size: 0.0015339453238993883Per-token loss scaled by world size: 0.0011989163467660546 | |
Epoch: 2, Step: 34, Rank: 0, loss = 0.33425354957580566 | |
Epoch: 2, Step: 34, Rank: 4, loss = 0.411106675863266 | |
Epoch: 2, Step: 34, Rank: 1, loss = 0.16858649253845215Epoch: 2, Step: 34, Rank: 6, loss = 0.6048751473426819Epoch: 2, Step: 34, Rank: 2, loss = 0.15895415842533112Epoch: 2, Step: 34, Rank: 5, loss = 0.10550463944673538 | |
Epoch: 2, Step: 34, Rank: 3, loss = 0.13498719036579132 | |
Per-token loss scaled by world size: 0.0012863841839134693 | |
Epoch: 2, Step: 34, Rank: 7, loss = 0.113201804459095 | |
[2024-07-27 20:05:16,383] [INFO] [logging.py:96:log_dist] [Rank 0] step=34, skipped=0, lr=[1.9560357815343577e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:16,461] [INFO] [timer.py:258:stop] epoch=0/micro_step=34/global_step=34, RunningAvgSamplesPerSec=31.575999746037336, CurrSamplesPerSec=30.869055674257183, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 2: 83%|████████▎ | 10/12 [00:23<00:08, 4.45s/it]{ | |
"epoch": 2, | |
"step": 34, | |
"rank": 0, | |
"loss": 0.33425354957580566, | |
"overall_throughput": 30.809901389200697, | |
"lr": 1.9560357815343577e-05, | |
"cuda_mem_allocated": 21.999046802520752, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 704, | |
"batch_size": 16, | |
"total_loss": 0.25393369793891907, | |
"gradnorm": 3.9217264652252197, | |
"weight_norm": 393.4627685546875, | |
"timestamp": "2024-07-27T20:05:16.505094" | |
} | |
Per-token loss scaled by world size: 0.0034846733324229717Per-token loss scaled by world size: 0.0033776769414544106Per-token loss scaled by world size: 0.0047375233843922615Per-token loss scaled by world size: 0.002200631657615304 | |
Per-token loss scaled by world size: 0.0065323468297719955 | |
Per-token loss scaled by world size: 0.000672308262437582Per-token loss scaled by world size: 0.0031945251394063234 | |
Epoch: 2, Step: 35, Rank: 3, loss = 0.22757098078727722Epoch: 2, Step: 35, Rank: 1, loss = 0.14826755225658417 | |
Epoch: 2, Step: 35, Rank: 4, loss = 0.31919065117836Epoch: 2, Step: 35, Rank: 5, loss = 0.44011685252189636Epoch: 2, Step: 35, Rank: 6, loss = 0.21523113548755646 | |
Epoch: 2, Step: 35, Rank: 2, loss = 0.045296769589185715Epoch: 2, Step: 35, Rank: 0, loss = 0.23477986454963684 | |
Per-token loss scaled by world size: 0.0005250901449471712 | |
Epoch: 2, Step: 35, Rank: 7, loss = 0.035377949476242065 | |
[2024-07-27 20:05:16,936] [INFO] [logging.py:96:log_dist] [Rank 0] step=35, skipped=0, lr=[1.9458172417006347e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:17,013] [INFO] [timer.py:258:stop] epoch=0/micro_step=35/global_step=35, RunningAvgSamplesPerSec=31.574778541658905, CurrSamplesPerSec=31.53574981496928, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 2: 92%|█████████▏| 11/12 [00:24<00:03, 3.25s/it]{ | |
"epoch": 2, | |
"step": 35, | |
"rank": 0, | |
"loss": 0.23477986454963684, | |
"overall_throughput": 31.455751447078118, | |
"lr": 1.9458172417006347e-05, | |
"cuda_mem_allocated": 22.0038161277771, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 539, | |
"batch_size": 16, | |
"total_loss": 0.2082289606332779, | |
"gradnorm": 3.2071847915649414, | |
"weight_norm": 393.4632263183594, | |
"timestamp": "2024-07-27T20:05:17.057249" | |
} | |
Per-token loss scaled by world size: 0.005230115260928869Per-token loss scaled by world size: 0.0020361002534627914Per-token loss scaled by world size: 0.0016461815685033798Per-token loss scaled by world size: 0.0032435867469757795Per-token loss scaled by world size: 0.0013154539046809077 | |
Per-token loss scaled by world size: 0.007001029327511787 | |
Per-token loss scaled by world size: 0.0026217142585664988 | |
Epoch: 2, Step: 36, Rank: 3, loss = 0.2177257537841797 | |
Epoch: 2, Step: 36, Rank: 5, loss = 0.1366732269525528Epoch: 2, Step: 36, Rank: 0, loss = 0.08829984068870544 | |
Epoch: 2, Step: 36, Rank: 6, loss = 0.35107147693634033 | |
Epoch: 2, Step: 36, Rank: 2, loss = 0.11049994081258774Epoch: 2, Step: 36, Rank: 1, loss = 0.4699440896511078 | |
Epoch: 2, Step: 36, Rank: 7, loss = 0.17598256468772888 | |
Per-token loss scaled by world size: 0.00114376877900213 | |
Epoch: 2, Step: 36, Rank: 4, loss = 0.07677547633647919 | |
[2024-07-27 20:05:17,490] [INFO] [logging.py:96:log_dist] [Rank 0] step=36, skipped=0, lr=[1.934564464599461e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:17,568] [INFO] [timer.py:258:stop] epoch=0/micro_step=36/global_step=36, RunningAvgSamplesPerSec=31.570395158941494, CurrSamplesPerSec=31.426423180739413, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 2: 100%|██████████| 12/12 [00:24<00:00, 2.43s/it]{ | |
"epoch": 2, | |
"step": 36, | |
"rank": 0, | |
"loss": 0.08829984068870544, | |
"overall_throughput": 31.342732108746315, | |
"lr": 1.934564464599461e-05, | |
"cuda_mem_allocated": 21.999046802520752, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 537, | |
"batch_size": 16, | |
"total_loss": 0.2033715397119522, | |
"gradnorm": 3.2520809173583984, | |
"weight_norm": 393.4635925292969, | |
"timestamp": "2024-07-27T20:05:17.613214" | |
} | |
Epoch 2: 100%|██████████| 12/12 [00:25<00:00, 2.09s/it] | |
total tokens: 118 num samples: 2 num padding tokens: 8 - rank: 2 max len: 59 min len: 51 avg len: 55.0 num_loss_counted_tokens: 60 | |
total tokens: 132 num samples: 2 num padding tokens: 4 - rank: 2 max len: 66 min len: 62 avg len: 64.0 num_loss_counted_tokens: 64 | |
total tokens: 166 num samples: 2 num padding tokens: 19 - rank: 2 max len: 83 min len: 64 avg len: 73.5 num_loss_counted_tokens: 70 | |
total tokens: 180 num samples: 2 num padding tokens: 33 - rank: 2 max len: 90 min len: 57 avg len: 73.5 num_loss_counted_tokens: 99 | |
total tokens: 214 num samples: 2 num padding tokens: 40 - rank: 2 max len: 107 min len: 67 avg len: 87.0 num_loss_counted_tokens: 103 | |
total tokens: 126 num samples: 2 num padding tokens: 17 - rank: 2 max len: 63 min len: 46 avg len: 54.5 num_loss_counted_tokens: 54 | |
total tokens: 124 num samples: 2 num padding tokens: 4 - rank: 2 max len: 62 min len: 58 avg len: 60.0 num_loss_counted_tokens: 72 | |
total tokens: 136 num samples: 2 num padding tokens: 7 - rank: 2 max len: 68 min len: 61 avg len: 64.5 num_loss_counted_tokens: 59 | |
total tokens: 142 num samples: 2 num padding tokens: 12 - rank: 2 max len: 71 min len: 59 avg len: 65.0 num_loss_counted_tokens: 72 | |
total tokens: 154 num samples: 2 num padding tokens: 22 - rank: 2 max len: 77 min len: 55 avg len: 66.0 num_loss_counted_tokens: 75 | |
total tokens: 128 num samples: 2 num padding tokens: 21 - rank: 2 max len: 64 min len: 43 avg len: 53.5 num_loss_counted_tokens: 49 | |
total tokens: 126 num samples: 2 num padding tokens: 11 - rank: 2 max len: 63 min len: 52 avg len: 57.5 num_loss_counted_tokens: 55 | |
total tokens: 152 num samples: 2 num padding tokens: 17 - rank: 4 max len: 76 min len: 59 avg len: 67.5 num_loss_counted_tokens: 71 | |
total tokens: 152 num samples: 2 num padding tokens: 7 - rank: 4 max len: 76 min len: 69 avg len: 72.5 num_loss_counted_tokens: 84 | |
total tokens: 146 num samples: 2 num padding tokens: 21 - rank: 7 max len: 73 min len: 52 avg len: 62.5 num_loss_counted_tokens: 71 total tokens: 184 num samples: 2 num padding tokens: 32 - rank: 5 max len: 92 min len: 60 avg len: 76.0 num_loss_counted_tokens: 89 | |
total tokens: 168 num samples: 2 num padding tokens: 33 - rank: 7 max len: 84 min len: 51 avg len: 67.5 num_loss_counted_tokens: 85 | |
total tokens: 142 num samples: 2 num padding tokens: 13 - rank: 4 max len: 71 min len: 58 avg len: 64.5 num_loss_counted_tokens: 58 | |
total tokens: 166 num samples: 2 num padding tokens: 19 - rank: 7 max len: 83 min len: 64 avg len: 73.5 num_loss_counted_tokens: 83 | |
total tokens: 128 num samples: 2 num padding tokens: 9 - rank: 5 max len: 64 min len: 55 avg len: 59.5 num_loss_counted_tokens: 73 | |
total tokens: 138 num samples: 2 num padding tokens: 12 - rank: 5 max len: 69 min len: 57 avg len: 63.0 num_loss_counted_tokens: 53 | |
total tokens: 136 num samples: 2 num padding tokens: 6 - rank: 4 max len: 68 min len: 62 avg len: 65.0 num_loss_counted_tokens: 54 | |
total tokens: 146 num samples: 2 num padding tokens: 10 - rank: 5 max len: 73 min len: 63 avg len: 68.0 num_loss_counted_tokens: 79 | |
total tokens: 152 num samples: 2 num padding tokens: 9 - rank: 5 max len: 76 min len: 67 avg len: 71.5 num_loss_counted_tokens: 81 | |
total tokens: 138 num samples: 2 num padding tokens: 14 - rank: 5 max len: 69 min len: 55 avg len: 62.0 num_loss_counted_tokens: 78 | |
total tokens: 96 num samples: 2 num padding tokens: 0 - rank: 5 max len: 48 min len: 48 avg len: 48.0 num_loss_counted_tokens: 46 | |
total tokens: 174 num samples: 2 num padding tokens: 27 - rank: 4 max len: 87 min len: 60 avg len: 73.5 num_loss_counted_tokens: 85 | |
total tokens: 136 num samples: 2 num padding tokens: 4 - rank: 7 max len: 68 min len: 64 avg len: 66.0 num_loss_counted_tokens: 65 | |
total tokens: 148 num samples: 2 num padding tokens: 2 - rank: 7 max len: 74 min len: 72 avg len: 73.0 num_loss_counted_tokens: 75 | |
total tokens: 186 num samples: 2 num padding tokens: 43 - rank: 7 max len: 93 min len: 50 avg len: 71.5 num_loss_counted_tokens: 120 | |
total tokens: 188 num samples: 2 num padding tokens: 23 - rank: 7 max len: 94 min len: 71 avg len: 82.5 num_loss_counted_tokens: 97 | |
total tokens: 152 num samples: 2 num padding tokens: 15 - rank: 5 max len: 76 min len: 61 avg len: 68.5 num_loss_counted_tokens: 71 | |
total tokens: 140 num samples: 2 num padding tokens: 24 - rank: 3 max len: 70 min len: 46 avg len: 58.0 num_loss_counted_tokens: 67 | |
total tokens: 166 num samples: 2 num padding tokens: 34 - rank: 7 max len: 83 min len: 49 avg len: 66.0 num_loss_counted_tokens: 74 | |
total tokens: 140 num samples: 2 num padding tokens: 4 - rank: 7 max len: 70 min len: 66 avg len: 68.0 num_loss_counted_tokens: 52 | |
total tokens: 142 num samples: 2 num padding tokens: 22 - rank: 7 max len: 71 min len: 49 avg len: 60.0 num_loss_counted_tokens: 70 | |
total tokens: 132 num samples: 2 num padding tokens: 6 - rank: 3 max len: 66 min len: 60 avg len: 63.0 num_loss_counted_tokens: 69 | |
total tokens: 202 num samples: 2 num padding tokens: 46 - rank: 7 max len: 101 min len: 55 avg len: 78.0 num_loss_counted_tokens: 96 | |
total tokens: 180 num samples: 2 num padding tokens: 45 - rank: 3 max len: 90 min len: 45 avg len: 67.5 num_loss_counted_tokens: 86 total tokens: 172 num samples: 2 num padding tokens: 23 - rank: 3 max len: 86 min len: 63 avg len: 74.5 num_loss_counted_tokens: 81 | |
total tokens: 186 num samples: 2 num padding tokens: 13 - rank: 3 max len: 93 min len: 80 avg len: 86.5 num_loss_counted_tokens: 122 | |
total tokens: 140 num samples: 2 num padding tokens: 15 - rank: 6 max len: 70 min len: 55 avg len: 62.5 num_loss_counted_tokens: 59 | |
total tokens: 132 num samples: 2 num padding tokens: 21 - rank: 6 max len: 66 min len: 45 avg len: 55.5 num_loss_counted_tokens: 56 | |
total tokens: 134 num samples: 2 num padding tokens: 23 - rank: 6 max len: 67 min len: 44 avg len: 55.5 num_loss_counted_tokens: 47 | |
total tokens: 118 num samples: 2 num padding tokens: 1 - rank: 3 max len: 59 min len: 58 avg len: 58.5 num_loss_counted_tokens: 59 | |
total tokens: 128 num samples: 2 num padding tokens: 11 - rank: 6 max len: 64 min len: 53 avg len: 58.5 num_loss_counted_tokens: 59 | |
total tokens: 186 num samples: 2 num padding tokens: 41 - rank: 6 max len: 93 min len: 52 avg len: 72.5 num_loss_counted_tokens: 81 | |
total tokens: 106 num samples: 2 num padding tokens: 4 - rank: 4 max len: 53 min len: 49 avg len: 51.0 num_loss_counted_tokens: 57 | |
total tokens: 124 num samples: 2 num padding tokens: 2 - rank: 3 max len: 62 min len: 60 avg len: 61.0 num_loss_counted_tokens: 67 | |
total tokens: 140 num samples: 2 num padding tokens: 10 - rank: 4 max len: 70 min len: 60 avg len: 65.0 num_loss_counted_tokens: 62 | |
total tokens: 140 num samples: 2 num padding tokens: 12 - rank: 4 max len: 70 min len: 58 avg len: 64.0 num_loss_counted_tokens: 73 | |
total tokens: 282 num samples: 2 num padding tokens: 86 - rank: 4 max len: 141 min len: 55 avg len: 98.0 num_loss_counted_tokens: 139 | |
total tokens: 174 num samples: 2 num padding tokens: 25 - rank: 5 max len: 87 min len: 62 avg len: 74.5 num_loss_counted_tokens: 74 | |
total tokens: 132 num samples: 2 num padding tokens: 8 - rank: 4 max len: 66 min len: 58 avg len: 62.0 num_loss_counted_tokens: 70 | |
total tokens: 208 num samples: 2 num padding tokens: 44 - rank: 4 max len: 104 min len: 60 avg len: 82.0 num_loss_counted_tokens: 109 | |
total tokens: 172 num samples: 2 num padding tokens: 29 - rank: 6 max len: 86 min len: 57 avg len: 71.5 num_loss_counted_tokens: 54 | |
total tokens: 162 num samples: 2 num padding tokens: 21 - rank: 6 max len: 81 min len: 60 avg len: 70.5 num_loss_counted_tokens: 98 | |
total tokens: 188 num samples: 2 num padding tokens: 39 - rank: 6 max len: 94 min len: 55 avg len: 74.5 num_loss_counted_tokens: 86 | |
total tokens: 148 num samples: 2 num padding tokens: 23 - rank: 6 max len: 74 min len: 51 avg len: 62.5 num_loss_counted_tokens: 62 | |
total tokens: 172 num samples: 2 num padding tokens: 21 - rank: 6 max len: 86 min len: 65 avg len: 75.5 num_loss_counted_tokens: 71 | |
total tokens: 132 num samples: 2 num padding tokens: 8 - rank: 5 max len: 66 min len: 58 avg len: 62.0 num_loss_counted_tokens: 65 | |
total tokens: 110 num samples: 2 num padding tokens: 9 - rank: 5 max len: 55 min len: 46 avg len: 50.5 num_loss_counted_tokens: 53 | |
total tokens: 164 num samples: 2 num padding tokens: 24 - rank: 3 max len: 82 min len: 58 avg len: 70.0 num_loss_counted_tokens: 96 | |
total tokens: 118 num samples: 2 num padding tokens: 15 - rank: 3 max len: 59 min len: 44 avg len: 51.5 num_loss_counted_tokens: 51 | |
total tokens: 132 num samples: 2 num padding tokens: 2 - rank: 5 max len: 66 min len: 64 avg len: 65.0 num_loss_counted_tokens: 65 | |
total tokens: 130 num samples: 2 num padding tokens: 14 - rank: 3 max len: 65 min len: 51 avg len: 58.0 num_loss_counted_tokens: 61 | |
total tokens: 122 num samples: 2 num padding tokens: 2 - rank: 1 max len: 61 min len: 59 avg len: 60.0 num_loss_counted_tokens: 60 | |
total tokens: 228 num samples: 2 num padding tokens: 34 - rank: 0 max len: 114 min len: 80 avg len: 97.0 num_loss_counted_tokens: 135 | |
total tokens: 164 num samples: 2 num padding tokens: 1 - rank: 3 max len: 82 min len: 81 avg len: 81.5 num_loss_counted_tokens: 105 | |
total tokens: 216 num samples: 2 num padding tokens: 47 - rank: 4 max len: 108 min len: 61 avg len: 84.5 num_loss_counted_tokens: 109 | |
total tokens: 194 num samples: 2 num padding tokens: 19 - rank: 6 max len: 97 min len: 78 avg len: 87.5 num_loss_counted_tokens: 114 | |
total tokens: 122 num samples: 2 num padding tokens: 6 - rank: 0 max len: 61 min len: 55 avg len: 58.0 num_loss_counted_tokens: 62 | |
total tokens: 140 num samples: 2 num padding tokens: 10 - rank: 1 max len: 70 min len: 60 avg len: 65.0 num_loss_counted_tokens: 62 | |
total tokens: 180 num samples: 2 num padding tokens: 3 - rank: 1 max len: 90 min len: 87 avg len: 88.5 num_loss_counted_tokens: 136 | |
total tokens: 244 num samples: 2 num padding tokens: 70 - rank: 1 max len: 122 min len: 52 avg len: 87.0 num_loss_counted_tokens: 125 | |
total tokens: 124 num samples: 2 num padding tokens: 12 - rank: 1 max len: 62 min len: 50 avg len: 56.0 num_loss_counted_tokens: 56 | |
total tokens: 106 num samples: 2 num padding tokens: 9 - rank: 1 max len: 53 min len: 44 avg len: 48.5 num_loss_counted_tokens: 47 | |
total tokens: 110 num samples: 2 num padding tokens: 10 - rank: 1 max len: 55 min len: 45 avg len: 50.0 num_loss_counted_tokens: 54 | |
total tokens: 158 num samples: 2 num padding tokens: 25 - rank: 1 max len: 79 min len: 54 avg len: 66.5 num_loss_counted_tokens: 74 | |
total tokens: 146 num samples: 2 num padding tokens: 1 - rank: 0 max len: 73 min len: 72 avg len: 72.5 num_loss_counted_tokens: 101 | |
total tokens: 140 num samples: 2 num padding tokens: 3 - rank: 7 max len: 70 min len: 67 avg len: 68.5 num_loss_counted_tokens: 81 | |
total tokens: 196 num samples: 2 num padding tokens: 53 - rank: 1 max len: 98 min len: 45 avg len: 71.5 num_loss_counted_tokens: 94 | |
total tokens: 228 num samples: 2 num padding tokens: 60 - rank: 1 max len: 114 min len: 54 avg len: 84.0 num_loss_counted_tokens: 122 | |
total tokens: 114 num samples: 2 num padding tokens: 5 - rank: 6 max len: 57 min len: 52 avg len: 54.5 num_loss_counted_tokens: 53 | |
total tokens: 226 num samples: 2 num padding tokens: 6 - rank: 0 max len: 113 min len: 107 avg len: 110.0 num_loss_counted_tokens: 142 | |
total tokens: 152 num samples: 2 num padding tokens: 5 - rank: 0 max len: 76 min len: 71 avg len: 73.5 num_loss_counted_tokens: 85 | |
total tokens: 158 num samples: 2 num padding tokens: 29 - rank: 1 max len: 79 min len: 50 avg len: 64.5 num_loss_counted_tokens: 69 | |
total tokens: 154 num samples: 2 num padding tokens: 14 - rank: 0 max len: 77 min len: 63 avg len: 70.0 num_loss_counted_tokens: 86 | |
total tokens: 150 num samples: 2 num padding tokens: 21 - rank: 0 max len: 75 min len: 54 avg len: 64.5 num_loss_counted_tokens: 75 | |
total tokens: 122 num samples: 2 num padding tokens: 1 - rank: 0 max len: 61 min len: 60 avg len: 60.5 num_loss_counted_tokens: 59 | |
total tokens: 200 num samples: 2 num padding tokens: 52 - rank: 0 max len: 100 min len: 48 avg len: 74.0 num_loss_counted_tokens: 95 | |
total tokens: 168 num samples: 2 num padding tokens: 32 - rank: 0 max len: 84 min len: 52 avg len: 68.0 num_loss_counted_tokens: 80 | |
total tokens: 162 num samples: 2 num padding tokens: 21 - rank: 0 max len: 81 min len: 60 avg len: 70.5 num_loss_counted_tokens: 84 | |
total tokens: 136 num samples: 2 num padding tokens: 5 - rank: 3 max len: 68 min len: 63 avg len: 65.5 num_loss_counted_tokens: 60 | |
total tokens: 176 num samples: 2 num padding tokens: 27 - rank: 1 max len: 88 min len: 61 avg len: 74.5 num_loss_counted_tokens: 83 | |
total tokens: 126 num samples: 2 num padding tokens: 13 - rank: 0 max len: 63 min len: 50 avg len: 56.5 num_loss_counted_tokens: 57 | |
Per-token loss scaled by world size: 0.0011273464187979698Per-token loss scaled by world size: 0.011045603081583977Per-token loss scaled by world size: 0.0008964258013293147 | |
Per-token loss scaled by world size: 0.005275155883282423Per-token loss scaled by world size: 0.0004854030557908118Per-token loss scaled by world size: 0.001362472539767623 | |
Per-token loss scaled by world size: 0.0016097185434773564 | |
Epoch: 3, Step: 37, Rank: 5, loss = 0.08314179629087448 | |
Epoch: 3, Step: 37, Rank: 3, loss = 0.06611140072345734 | |
Epoch: 3, Step: 37, Rank: 0, loss = 0.8146132230758667 | |
Epoch: 3, Step: 37, Rank: 7, loss = 0.3890427350997925 | |
Epoch: 3, Step: 37, Rank: 6, loss = 0.03579847514629364 | |
Epoch: 3, Step: 37, Rank: 4, loss = 0.10048235207796097 | |
Epoch: 3, Step: 37, Rank: 2, loss = 0.11871673911809921 | |
Per-token loss scaled by world size: 0.0010947352275252342 | |
Epoch: 3, Step: 37, Rank: 1, loss = 0.08073671907186508 | |
[2024-07-27 20:05:18,523] [INFO] [logging.py:96:log_dist] [Rank 0] step=37, skipped=0, lr=[1.922289754977385e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:18,600] [INFO] [timer.py:258:stop] epoch=0/micro_step=37/global_step=37, RunningAvgSamplesPerSec=31.510911148754047, CurrSamplesPerSec=29.613797942311468, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3, | 1/12 [00:00<00:10, 1.06it/s] | |
"step": 37, | |
"rank": 0, | |
"loss": 0.8146132230758667, | |
"overall_throughput": 29.510024167760136, | |
"lr": 1.922289754977385e-05, | |
"cuda_mem_allocated": 22.01049518585205, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 590, | |
"batch_size": 16, | |
"total_loss": 0.21108044683933258, | |
"gradnorm": 3.543294906616211, | |
"weight_norm": 393.4640197753906, | |
"timestamp": "2024-07-27T20:05:18.643289" | |
} | |
Per-token loss scaled by world size: 0.0009516195859760046Per-token loss scaled by world size: 0.005429701413959265Per-token loss scaled by world size: 0.000703756813891232Per-token loss scaled by world size: 0.0006457434501498938Per-token loss scaled by world size: 0.0020593185909092426Per-token loss scaled by world size: 0.0010209670290350914Per-token loss scaled by world size: 0.0010425481013953686 | |
Epoch: 3, Step: 38, Rank: 1, loss = 0.4642394781112671Epoch: 3, Step: 38, Rank: 6, loss = 0.060171205550432205Epoch: 3, Step: 38, Rank: 5, loss = 0.05521106347441673 | |
Epoch: 3, Step: 38, Rank: 7, loss = 0.17607174813747406 | |
Epoch: 3, Step: 38, Rank: 4, loss = 0.08729267865419388Epoch: 3, Step: 38, Rank: 2, loss = 0.08913786709308624Epoch: 3, Step: 38, Rank: 3, loss = 0.08136347681283951 | |
Per-token loss scaled by world size: 0.0009524038759991527 | |
Epoch: 3, Step: 38, Rank: 0, loss = 0.08143053203821182 | |
[2024-07-27 20:05:19,063] [INFO] [logging.py:96:log_dist] [Rank 0] step=38, skipped=0, lr=[1.909006535049163e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:19,141] [INFO] [timer.py:258:stop] epoch=0/micro_step=38/global_step=38, RunningAvgSamplesPerSec=31.53368842638859, CurrSamplesPerSec=32.35217658966323, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3,▋ | 2/12 [00:01<00:07, 1.42it/s] | |
"step": 38, | |
"rank": 0, | |
"loss": 0.08143053203821182, | |
"overall_throughput": 32.29991928493285, | |
"lr": 1.909006535049163e-05, | |
"cuda_mem_allocated": 21.99785280227661, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 684, | |
"batch_size": 16, | |
"total_loss": 0.13686475157737732, | |
"gradnorm": 7.216549396514893, | |
"weight_norm": 393.4645080566406, | |
"timestamp": "2024-07-27T20:05:19.186626" | |
} | |
Per-token loss scaled by world size: 0.0020264529157429934Per-token loss scaled by world size: 0.0008862476679496467Per-token loss scaled by world size: 0.0010139633668586612Per-token loss scaled by world size: 0.00098150665871799Per-token loss scaled by world size: 0.0027166178915649652 | |
Per-token loss scaled by world size: 0.0003891867527272552Per-token loss scaled by world size: 0.0019367473432794213 | |
Epoch: 3, Step: 39, Rank: 0, loss = 0.14565131068229675Epoch: 3, Step: 39, Rank: 2, loss = 0.07287861406803131Epoch: 3, Step: 39, Rank: 4, loss = 0.06369905173778534 | |
Epoch: 3, Step: 39, Rank: 5, loss = 0.19525690376758575Epoch: 3, Step: 39, Rank: 3, loss = 0.13920371234416962Epoch: 3, Step: 39, Rank: 7, loss = 0.07054579257965088 | |
Epoch: 3, Step: 39, Rank: 1, loss = 0.02797279693186283 | |
Per-token loss scaled by world size: 0.0002103938313666731 | |
Epoch: 3, Step: 39, Rank: 6, loss = 0.015122056938707829 | |
[2024-07-27 20:05:19,596] [INFO] [logging.py:96:log_dist] [Rank 0] step=39, skipped=0, lr=[1.8947293298207637e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:19,674] [INFO] [timer.py:258:stop] epoch=0/micro_step=39/global_step=39, RunningAvgSamplesPerSec=31.57390143166257, CurrSamplesPerSec=33.093162948245876, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3,█▌ | 3/12 [00:02<00:05, 1.60it/s] | |
"step": 39, | |
"rank": 0, | |
"loss": 0.14565131068229675, | |
"overall_throughput": 33.040161880332626, | |
"lr": 1.8947293298207637e-05, | |
"cuda_mem_allocated": 22.00071620941162, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 575, | |
"batch_size": 16, | |
"total_loss": 0.09129128605127335, | |
"gradnorm": 1.8486279249191284, | |
"weight_norm": 393.4649658203125, | |
"timestamp": "2024-07-27T20:05:19.677166" | |
} | |
Per-token loss scaled by world size: 0.0030472958460450172Per-token loss scaled by world size: 0.0027008713223040104Per-token loss scaled by world size: 0.0013511140132322907Per-token loss scaled by world size: 0.00032714917324483395Per-token loss scaled by world size: 0.0010533123277127743 | |
Per-token loss scaled by world size: 0.0011340089840814471 | |
Per-token loss scaled by world size: 0.000277617946267128 | |
Epoch: 3, Step: 40, Rank: 0, loss = 0.2605437934398651 | |
Epoch: 3, Step: 40, Rank: 5, loss = 0.11552024632692337Epoch: 3, Step: 40, Rank: 3, loss = 0.027971254661679268Epoch: 3, Step: 40, Rank: 2, loss = 0.23092450201511383 | |
Epoch: 3, Step: 40, Rank: 7, loss = 0.09695777297019958 | |
Epoch: 3, Step: 40, Rank: 1, loss = 0.09005820006132126 | |
Epoch: 3, Step: 40, Rank: 4, loss = 0.023736335337162018 | |
Per-token loss scaled by world size: 0.0006618773913942277 | |
Epoch: 3, Step: 40, Rank: 6, loss = 0.05659051612019539 | |
[2024-07-27 20:05:20,133] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=0, lr=[1.879473751206489e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:20,211] [INFO] [timer.py:258:stop] epoch=0/micro_step=40/global_step=40, RunningAvgSamplesPerSec=31.60944150873908, CurrSamplesPerSec=32.98311497397824, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3,██▎ | 4/12 [00:02<00:04, 1.69it/s] | |
"step": 40, | |
"rank": 0, | |
"loss": 0.2605437934398651, | |
"overall_throughput": 32.92414856200329, | |
"lr": 1.879473751206489e-05, | |
"cuda_mem_allocated": 22.01025676727295, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 684, | |
"batch_size": 16, | |
"total_loss": 0.11278782784938812, | |
"gradnorm": 1.686691403388977, | |
"weight_norm": 393.46551513671875, | |
"timestamp": "2024-07-27T20:05:20.214384" | |
} | |
Per-token loss scaled by world size: 0.0014961636625230312Per-token loss scaled by world size: 0.0015210546553134918Per-token loss scaled by world size: 0.0010414053685963154Per-token loss scaled by world size: 0.001773377531208098 | |
Per-token loss scaled by world size: 0.0008040367974899709Per-token loss scaled by world size: 0.0032368989195674658 | |
Per-token loss scaled by world size: 0.0005034140776842833 | |
Epoch: 3, Step: 41, Rank: 6, loss = 0.12187450379133224 | |
Epoch: 3, Step: 41, Rank: 3, loss = 0.11988011747598648 | |
Epoch: 3, Step: 41, Rank: 0, loss = 0.08344260603189468 | |
Epoch: 3, Step: 41, Rank: 1, loss = 0.14209187030792236 | |
Epoch: 3, Step: 41, Rank: 7, loss = 0.2593565285205841Epoch: 3, Step: 41, Rank: 5, loss = 0.06442344933748245 | |
Epoch: 3, Step: 41, Rank: 4, loss = 0.04033605381846428 | |
Per-token loss scaled by world size: 0.0004881576751358807 | |
Epoch: 3, Step: 41, Rank: 2, loss = 0.03911363333463669 | |
[2024-07-27 20:05:20,673] [INFO] [logging.py:96:log_dist] [Rank 0] step=41, skipped=0, lr=[1.863256480957574e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:20,751] [INFO] [timer.py:258:stop] epoch=0/micro_step=41/global_step=41, RunningAvgSamplesPerSec=31.626163332071105, CurrSamplesPerSec=32.274971444510975, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3,███▏ | 5/12 [00:03<00:04, 1.75it/s] | |
"step": 41, | |
"rank": 0, | |
"loss": 0.08344260603189468, | |
"overall_throughput": 32.196172088118544, | |
"lr": 1.863256480957574e-05, | |
"cuda_mem_allocated": 22.001431465148926, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 641, | |
"batch_size": 16, | |
"total_loss": 0.10881484299898148, | |
"gradnorm": 1.5060667991638184, | |
"weight_norm": 393.46600341796875, | |
"timestamp": "2024-07-27T20:05:20.793861" | |
} | |
Per-token loss scaled by world size: 0.001642214716412127Per-token loss scaled by world size: 0.0013352860696613789Per-token loss scaled by world size: 0.001149832969531417Per-token loss scaled by world size: 0.0007547381101176143 | |
Per-token loss scaled by world size: 0.002172367414459586 | |
Per-token loss scaled by world size: 0.0017613072413951159Per-token loss scaled by world size: 0.0012277569621801376 | |
Epoch: 3, Step: 42, Rank: 0, loss = 0.10571756958961487Epoch: 3, Step: 42, Rank: 5, loss = 0.0859590396285057Epoch: 3, Step: 42, Rank: 6, loss = 0.04858626425266266 | |
Epoch: 3, Step: 42, Rank: 7, loss = 0.13984614610671997 | |
Epoch: 3, Step: 42, Rank: 2, loss = 0.07402049750089645Epoch: 3, Step: 42, Rank: 3, loss = 0.11338414996862411 | |
Epoch: 3, Step: 42, Rank: 4, loss = 0.07903685420751572 | |
Per-token loss scaled by world size: 0.0004327835631556809 | |
Epoch: 3, Step: 42, Rank: 1, loss = 0.02786044217646122 | |
[2024-07-27 20:05:21,216] [INFO] [logging.py:96:log_dist] [Rank 0] step=42, skipped=0, lr=[1.8460952524209355e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:21,294] [INFO] [timer.py:258:stop] epoch=0/micro_step=42/global_step=42, RunningAvgSamplesPerSec=31.636401598459667, CurrSamplesPerSec=32.040930582537946, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3,████ | 6/12 [00:03<00:03, 1.78it/s] | |
"step": 42, | |
"rank": 0, | |
"loss": 0.10571756958961487, | |
"overall_throughput": 31.961989783039616, | |
"lr": 1.8460952524209355e-05, | |
"cuda_mem_allocated": 22.001193046569824, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 515, | |
"batch_size": 16, | |
"total_loss": 0.08430136740207672, | |
"gradnorm": 2.007233142852783, | |
"weight_norm": 393.46649169921875, | |
"timestamp": "2024-07-27T20:05:21.336433" | |
} | |
Per-token loss scaled by world size: 0.001508427201770246Per-token loss scaled by world size: 0.0023370874114334583Per-token loss scaled by world size: 0.0023123989813029766Per-token loss scaled by world size: 0.0019151513697579503Per-token loss scaled by world size: 0.0032233409583568573Per-token loss scaled by world size: 0.0005486609297804534 | |
Per-token loss scaled by world size: 0.0025942232459783554 | |
Epoch: 3, Step: 43, Rank: 1, loss = 0.20537155866622925Epoch: 3, Step: 43, Rank: 7, loss = 0.20320206880569458Epoch: 3, Step: 43, Rank: 6, loss = 0.16829392313957214 | |
Epoch: 3, Step: 43, Rank: 4, loss = 0.13255304098129272Epoch: 3, Step: 43, Rank: 5, loss = 0.04821357876062393Epoch: 3, Step: 43, Rank: 3, loss = 0.2832510769367218 | |
Epoch: 3, Step: 43, Rank: 0, loss = 0.22796736657619476 | |
Per-token loss scaled by world size: 0.0013315769610926509 | |
Epoch: 3, Step: 43, Rank: 2, loss = 0.11701232939958572 | |
[2024-07-27 20:05:21,753] [INFO] [logging.py:96:log_dist] [Rank 0] step=43, skipped=0, lr=[1.8280088311480203e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:21,830] [INFO] [timer.py:258:stop] epoch=0/micro_step=43/global_step=43, RunningAvgSamplesPerSec=31.65571342945945, CurrSamplesPerSec=32.44800374432416, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3,████▊ | 7/12 [00:04<00:02, 1.80it/s] | |
"step": 43, | |
"rank": 0, | |
"loss": 0.22796736657619476, | |
"overall_throughput": 32.36756205093528, | |
"lr": 1.8280088311480203e-05, | |
"cuda_mem_allocated": 22.007156372070312, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 703, | |
"batch_size": 16, | |
"total_loss": 0.17323312163352966, | |
"gradnorm": 3.137274742126465, | |
"weight_norm": 393.4668884277344, | |
"timestamp": "2024-07-27T20:05:21.873356" | |
} | |
Per-token loss scaled by world size: 0.0031055721919983625Per-token loss scaled by world size: 0.005434826016426086Per-token loss scaled by world size: 0.0022665681317448616Per-token loss scaled by world size: 0.0013299736892804503Per-token loss scaled by world size: 0.002249634126201272 | |
Per-token loss scaled by world size: 0.002406098647043109 | |
Per-token loss scaled by world size: 0.0007422761409543455 | |
Epoch: 3, Step: 44, Rank: 4, loss = 0.382475882768631 | |
Epoch: 3, Step: 44, Rank: 6, loss = 0.09359689801931381Epoch: 3, Step: 44, Rank: 5, loss = 0.15950973331928253 | |
Epoch: 3, Step: 44, Rank: 1, loss = 0.15831799805164337Epoch: 3, Step: 44, Rank: 7, loss = 0.1693291962146759 | |
Epoch: 3, Step: 44, Rank: 0, loss = 0.21855464577674866 | |
Epoch: 3, Step: 44, Rank: 3, loss = 0.05223768204450607 | |
Per-token loss scaled by world size: 0.000683724822010845 | |
Epoch: 3, Step: 44, Rank: 2, loss = 0.04811713472008705 | |
[2024-07-27 20:05:22,307] [INFO] [logging.py:96:log_dist] [Rank 0] step=44, skipped=0, lr=[1.8090169943749477e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:22,384] [INFO] [timer.py:258:stop] epoch=0/micro_step=44/global_step=44, RunningAvgSamplesPerSec=31.653925419020233, CurrSamplesPerSec=31.580790497837636, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Saving model in huggingface format at samples_seen: 704 | |
{ | |
"epoch": 3, | |
"step": 44, | |
"rank": 0, | |
"loss": 0.21855464577674866, | |
"overall_throughput": 31.53099355086062, | |
"lr": 1.8090169943749477e-05, | |
"cuda_mem_allocated": 21.998329639434814, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 563, | |
"batch_size": 16, | |
"total_loss": 0.1602673977613449, | |
"gradnorm": 2.924142599105835, | |
"weight_norm": 393.46728515625, | |
"timestamp": "2024-07-27T20:05:22.387393" | |
} | |
Model saved in /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_704 | |
[20:05:40] INFO saving took 17.951613903045654 seconds utils.py:611 | |
Per-token loss scaled by world size: 0.002263416536152363Per-token loss scaled by world size: 0.0010527893900871277Per-token loss scaled by world size: 0.004291217308491468Per-token loss scaled by world size: 0.002417604671791196Per-token loss scaled by world size: 0.002519844565540552 | |
Per-token loss scaled by world size: 0.0017610156210139394Per-token loss scaled by world size: 0.003203270025551319 | |
Epoch: 3, Step: 45, Rank: 0, loss = 0.16494648158550262 | |
Epoch: 3, Step: 45, Rank: 2, loss = 0.31272247433662415Epoch: 3, Step: 45, Rank: 6, loss = 0.07672202587127686 | |
Epoch: 3, Step: 45, Rank: 7, loss = 0.17618294060230255 | |
Epoch: 3, Step: 45, Rank: 3, loss = 0.18363367021083832 | |
Epoch: 3, Step: 45, Rank: 1, loss = 0.12833401560783386 | |
Epoch: 3, Step: 45, Rank: 4, loss = 0.23343829810619354 | |
Per-token loss scaled by world size: 0.0014321228954941034 | |
Epoch: 3, Step: 45, Rank: 5, loss = 0.10436595976352692 | |
[2024-07-27 20:05:40,811] [INFO] [logging.py:96:log_dist] [Rank 0] step=45, skipped=0, lr=[1.789140509396394e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:40,889] [INFO] [timer.py:258:stop] epoch=0/micro_step=45/global_step=45, RunningAvgSamplesPerSec=31.661507538235384, CurrSamplesPerSec=31.983269859773554, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3,██████▌ | 9/12 [00:23<00:13, 4.48s/it] | |
"step": 45, | |
"rank": 0, | |
"loss": 0.16494648158550262, | |
"overall_throughput": 31.91971183779508, | |
"lr": 1.789140509396394e-05, | |
"cuda_mem_allocated": 22.001669883728027, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 583, | |
"batch_size": 16, | |
"total_loss": 0.1725432276725769, | |
"gradnorm": 4.5381975173950195, | |
"weight_norm": 393.46771240234375, | |
"timestamp": "2024-07-27T20:05:40.931934" | |
} | |
Per-token loss scaled by world size: 0.005079582799226046Per-token loss scaled by world size: 0.003907069563865662Per-token loss scaled by world size: 0.0016345379408448935Per-token loss scaled by world size: 0.00215306063182652Per-token loss scaled by world size: 0.005338544957339764Per-token loss scaled by world size: 0.0032401932403445244 | |
Per-token loss scaled by world size: 0.0003386051394045353 | |
Epoch: 3, Step: 46, Rank: 5, loss = 0.18193362653255463Epoch: 3, Step: 46, Rank: 6, loss = 0.1381184607744217Epoch: 3, Step: 46, Rank: 2, loss = 0.330147385597229 | |
Epoch: 3, Step: 46, Rank: 7, loss = 0.028612133115530014 | |
Epoch: 3, Step: 46, Rank: 3, loss = 0.42922475934028625Epoch: 3, Step: 46, Rank: 1, loss = 0.27379631996154785 | |
Epoch: 3, Step: 46, Rank: 4, loss = 0.45110705494880676 | |
Per-token loss scaled by world size: 0.0007393484702333808 | |
Epoch: 3, Step: 46, Rank: 0, loss = 0.062474943697452545 | |
[2024-07-27 20:05:41,368] [INFO] [logging.py:96:log_dist] [Rank 0] step=46, skipped=0, lr=[1.7684011108568593e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:41,446] [INFO] [timer.py:258:stop] epoch=0/micro_step=46/global_step=46, RunningAvgSamplesPerSec=31.649546499309086, CurrSamplesPerSec=31.143634404390532, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3,███████▎ | 10/12 [00:23<00:06, 3.27s/it] | |
"step": 46, | |
"rank": 0, | |
"loss": 0.062474943697452545, | |
"overall_throughput": 31.079148738311545, | |
"lr": 1.7684011108568593e-05, | |
"cuda_mem_allocated": 21.99785280227661, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 676, | |
"batch_size": 16, | |
"total_loss": 0.2369268238544464, | |
"gradnorm": 3.1312427520751953, | |
"weight_norm": 393.46807861328125, | |
"timestamp": "2024-07-27T20:05:41.491963" | |
} | |
Per-token loss scaled by world size: 0.003072180086746812Per-token loss scaled by world size: 0.0009212895529344678Per-token loss scaled by world size: 0.001889892853796482Per-token loss scaled by world size: 0.005953831598162651 | |
Per-token loss scaled by world size: 0.0021448375191539526 | |
Per-token loss scaled by world size: 0.0008049519965425134 | |
Per-token loss scaled by world size: 0.005051196087151766 | |
Epoch: 3, Step: 47, Rank: 3, loss = 0.06748446077108383 | |
Epoch: 3, Step: 47, Rank: 2, loss = 0.13843464851379395 | |
Epoch: 3, Step: 47, Rank: 0, loss = 0.22503718733787537Epoch: 3, Step: 47, Rank: 7, loss = 0.43611815571784973 | |
Epoch: 3, Step: 47, Rank: 5, loss = 0.1571093499660492 | |
Epoch: 3, Step: 47, Rank: 4, loss = 0.058962732553482056 | |
Epoch: 3, Step: 47, Rank: 6, loss = 0.37000012397766113 | |
Per-token loss scaled by world size: 0.002140582073479891 | |
Epoch: 3, Step: 47, Rank: 1, loss = 0.1567976325750351 | |
[2024-07-27 20:05:41,903] [INFO] [logging.py:96:log_dist] [Rank 0] step=47, skipped=0, lr=[1.7468214769841542e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:41,981] [INFO] [timer.py:258:stop] epoch=0/micro_step=47/global_step=47, RunningAvgSamplesPerSec=31.673656937700265, CurrSamplesPerSec=32.7721445241366, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3,████████▏| 11/12 [00:24<00:02, 2.43s/it] | |
"step": 47, | |
"rank": 0, | |
"loss": 0.22503718733787537, | |
"overall_throughput": 32.68064947498726, | |
"lr": 1.7468214769841542e-05, | |
"cuda_mem_allocated": 22.002624034881592, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 586, | |
"batch_size": 16, | |
"total_loss": 0.20124304294586182, | |
"gradnorm": 2.9233510494232178, | |
"weight_norm": 393.46844482421875, | |
"timestamp": "2024-07-27T20:05:41.984708" | |
} | |
Per-token loss scaled by world size: 0.0007923523080535233Per-token loss scaled by world size: 0.005117433145642281Per-token loss scaled by world size: 0.001681540277786553Per-token loss scaled by world size: 0.0009754404309205711Per-token loss scaled by world size: 0.0007582003017887473Per-token loss scaled by world size: 0.0003797741374000907 | |
Per-token loss scaled by world size: 0.0006013160455040634 | |
Epoch: 3, Step: 48, Rank: 5, loss = 0.12737667560577393 | |
Epoch: 3, Step: 48, Rank: 1, loss = 0.38764557242393494Epoch: 3, Step: 48, Rank: 2, loss = 0.06002068519592285 | |
Epoch: 3, Step: 48, Rank: 6, loss = 0.028767891228199005Epoch: 3, Step: 48, Rank: 7, loss = 0.07388961315155029Epoch: 3, Step: 48, Rank: 0, loss = 0.05743367224931717 | |
Epoch: 3, Step: 48, Rank: 4, loss = 0.04554969072341919 | |
Per-token loss scaled by world size: 0.001107058022171259 | |
Epoch: 3, Step: 48, Rank: 3, loss = 0.0838596448302269 | |
[2024-07-27 20:05:42,468] [INFO] [logging.py:96:log_dist] [Rank 0] step=48, skipped=0, lr=[1.7244252047910893e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:42,546] [INFO] [timer.py:258:stop] epoch=0/micro_step=48/global_step=48, RunningAvgSamplesPerSec=31.653254285775276, CurrSamplesPerSec=30.761573372705392, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 3,█████████| 12/12 [00:24<00:00, 1.86s/it] | |
"step": 48, | |
"rank": 0, | |
"loss": 0.05743367224931717, | |
"overall_throughput": 30.690515825250888, | |
"lr": 1.7244252047910893e-05, | |
"cuda_mem_allocated": 22.003339290618896, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 606, | |
"batch_size": 16, | |
"total_loss": 0.10806792974472046, | |
"gradnorm": 1.4744030237197876, | |
"weight_norm": 393.4688415527344, | |
"timestamp": "2024-07-27T20:05:42.548994" | |
} | |
Epoch 3: 100%|██████████| 12/12 [00:24<00:00, 2.08s/it] | |
total tokens: 196 num samples: 2 num padding tokens: 44 - rank: 6 max len: 98 min len: 54 avg len: 76.0 num_loss_counted_tokens: 104 | |
total tokens: 102 num samples: 2 num padding tokens: 7 - rank: 0 max len: 51 min len: 44 avg len: 47.5 num_loss_counted_tokens: 51 | |
total tokens: 136 num samples: 2 num padding tokens: 0 - rank: 0 max len: 68 min len: 68 avg len: 68.0 num_loss_counted_tokens: 53 | |
total tokens: 152 num samples: 2 num padding tokens: 16 - rank: 6 max len: 76 min len: 60 avg len: 68.0 num_loss_counted_tokens: 75 | |
total tokens: 130 num samples: 2 num padding tokens: 15 - rank: 2 max len: 65 min len: 50 avg len: 57.5 num_loss_counted_tokens: 65 | |
total tokens: 110 num samples: 2 num padding tokens: 11 - rank: 7 max len: 55 min len: 44 avg len: 49.5 num_loss_counted_tokens: 53 | |
total tokens: 142 num samples: 2 num padding tokens: 0 - rank: 7 max len: 71 min len: 71 avg len: 71.0 num_loss_counted_tokens: 75 total tokens: 138 num samples: 2 num padding tokens: 10 - rank: 7 max len: 69 min len: 59 avg len: 64.0 num_loss_counted_tokens: 67 | |
total tokens: 120 num samples: 2 num padding tokens: 15 - rank: 6 max len: 60 min len: 45 avg len: 52.5 num_loss_counted_tokens: 59 | |
total tokens: 132 num samples: 2 num padding tokens: 5 - rank: 6 max len: 66 min len: 61 avg len: 63.5 num_loss_counted_tokens: 70 | |
total tokens: 166 num samples: 2 num padding tokens: 28 - rank: 7 max len: 83 min len: 55 avg len: 69.0 num_loss_counted_tokens: 89 | |
total tokens: 126 num samples: 2 num padding tokens: 6 - rank: 7 max len: 63 min len: 57 avg len: 60.0 num_loss_counted_tokens: 57 | |
total tokens: 154 num samples: 2 num padding tokens: 22 - rank: 6 max len: 77 min len: 55 avg len: 66.0 num_loss_counted_tokens: 75 | |
total tokens: 152 num samples: 2 num padding tokens: 15 - rank: 7 max len: 76 min len: 61 avg len: 68.5 num_loss_counted_tokens: 70 | |
total tokens: 282 num samples: 2 num padding tokens: 81 - rank: 6 max len: 141 min len: 60 avg len: 100.5 num_loss_counted_tokens: 156 | |
total tokens: 194 num samples: 2 num padding tokens: 10 - rank: 7 max len: 97 min len: 87 avg len: 92.0 num_loss_counted_tokens: 116 | |
total tokens: 132 num samples: 2 num padding tokens: 6 - rank: 7 max len: 66 min len: 60 avg len: 63.0 num_loss_counted_tokens: 62 | |
total tokens: 172 num samples: 2 num padding tokens: 32 - rank: 7 max len: 86 min len: 54 avg len: 70.0 num_loss_counted_tokens: 61 | |
total tokens: 164 num samples: 2 num padding tokens: 30 - rank: 0 max len: 82 min len: 52 avg len: 67.0 num_loss_counted_tokens: 81 | |
total tokens: 150 num samples: 2 num padding tokens: 13 - rank: 7 max len: 75 min len: 62 avg len: 68.5 num_loss_counted_tokens: 70 | |
total tokens: 180 num samples: 2 num padding tokens: 6 - rank: 0 max len: 90 min len: 84 avg len: 87.0 num_loss_counted_tokens: 114 | |
total tokens: 134 num samples: 2 num padding tokens: 15 - rank: 7 max len: 67 min len: 52 avg len: 59.5 num_loss_counted_tokens: 72 | |
total tokens: 106 num samples: 2 num padding tokens: 8 - rank: 6 max len: 53 min len: 45 avg len: 49.0 num_loss_counted_tokens: 45 | |
total tokens: 186 num samples: 2 num padding tokens: 16 - rank: 6 max len: 93 min len: 77 avg len: 85.0 num_loss_counted_tokens: 122 | |
total tokens: 144 num samples: 2 num padding tokens: 17 - rank: 4 max len: 72 min len: 55 avg len: 63.5 num_loss_counted_tokens: 59 | |
total tokens: 172 num samples: 2 num padding tokens: 26 - rank: 6 max len: 86 min len: 60 avg len: 73.0 num_loss_counted_tokens: 80 | |
total tokens: 118 num samples: 2 num padding tokens: 14 - rank: 4 max len: 59 min len: 45 avg len: 52.0 num_loss_counted_tokens: 52 | |
total tokens: 140 num samples: 2 num padding tokens: 12 - rank: 6 max len: 70 min len: 58 avg len: 64.0 num_loss_counted_tokens: 59 | |
total tokens: 186 num samples: 2 num padding tokens: 30 - rank: 5 max len: 93 min len: 63 avg len: 78.0 num_loss_counted_tokens: 100 | |
total tokens: 128 num samples: 2 num padding tokens: 6 - rank: 2 max len: 64 min len: 58 avg len: 61.0 num_loss_counted_tokens: 70 | |
total tokens: 114 num samples: 2 num padding tokens: 7 - rank: 5 max len: 57 min len: 50 avg len: 53.5 num_loss_counted_tokens: 59 | |
total tokens: 148 num samples: 2 num padding tokens: 16 - rank: 4 max len: 74 min len: 58 avg len: 66.0 num_loss_counted_tokens: 68 | |
total tokens: 146 num samples: 2 num padding tokens: 22 - rank: 4 max len: 73 min len: 51 avg len: 62.0 num_loss_counted_tokens: 72 | |
total tokens: 186 num samples: 2 num padding tokens: 45 - rank: 0 max len: 93 min len: 48 avg len: 70.5 num_loss_counted_tokens: 111 | |
total tokens: 180 num samples: 2 num padding tokens: 26 - rank: 0 max len: 90 min len: 64 avg len: 77.0 num_loss_counted_tokens: 118 | |
total tokens: 166 num samples: 2 num padding tokens: 16 - rank: 4 max len: 83 min len: 67 avg len: 75.0 num_loss_counted_tokens: 85 | |
total tokens: 158 num samples: 2 num padding tokens: 20 - rank: 1 max len: 79 min len: 59 avg len: 69.0 num_loss_counted_tokens: 66 | |
total tokens: 140 num samples: 2 num padding tokens: 2 - rank: 0 max len: 70 min len: 68 avg len: 69.0 num_loss_counted_tokens: 66 | |
total tokens: 132 num samples: 2 num padding tokens: 23 - rank: 0 max len: 66 min len: 43 avg len: 54.5 num_loss_counted_tokens: 45 | |
total tokens: 172 num samples: 2 num padding tokens: 32 - rank: 4 max len: 86 min len: 54 avg len: 70.0 num_loss_counted_tokens: 78 | |
total tokens: 114 num samples: 2 num padding tokens: 11 - rank: 6 max len: 57 min len: 46 avg len: 51.5 num_loss_counted_tokens: 59 | |
total tokens: 100 num samples: 2 num padding tokens: 1 - rank: 4 max len: 50 min len: 49 avg len: 49.5 num_loss_counted_tokens: 49 | |
total tokens: 126 num samples: 2 num padding tokens: 9 - rank: 1 max len: 63 min len: 54 avg len: 58.5 num_loss_counted_tokens: 63 | |
total tokens: 180 num samples: 2 num padding tokens: 30 - rank: 4 max len: 90 min len: 60 avg len: 75.0 num_loss_counted_tokens: 100 | |
total tokens: 134 num samples: 2 num padding tokens: 15 - rank: 0 max len: 67 min len: 52 avg len: 59.5 num_loss_counted_tokens: 57 | |
total tokens: 140 num samples: 2 num padding tokens: 12 - rank: 2 max len: 70 min len: 58 avg len: 64.0 num_loss_counted_tokens: 73 | |
total tokens: 184 num samples: 2 num padding tokens: 37 - rank: 1 max len: 92 min len: 55 avg len: 73.5 num_loss_counted_tokens: 87 | |
total tokens: 202 num samples: 2 num padding tokens: 46 - rank: 4 max len: 101 min len: 55 avg len: 78.0 num_loss_counted_tokens: 106 | |
total tokens: 162 num samples: 2 num padding tokens: 2 - rank: 0 max len: 81 min len: 79 avg len: 80.0 num_loss_counted_tokens: 93 | |
total tokens: 174 num samples: 2 num padding tokens: 32 - rank: 2 max len: 87 min len: 55 avg len: 71.0 num_loss_counted_tokens: 73 | |
total tokens: 146 num samples: 2 num padding tokens: 10 - rank: 2 max len: 73 min len: 63 avg len: 68.0 num_loss_counted_tokens: 73 | |
total tokens: 152 num samples: 2 num padding tokens: 13 - rank: 0 max len: 76 min len: 63 avg len: 69.5 num_loss_counted_tokens: 87 | |
total tokens: 138 num samples: 2 num padding tokens: 7 - rank: 4 max len: 69 min len: 62 avg len: 65.5 num_loss_counted_tokens: 77 | |
total tokens: 208 num samples: 2 num padding tokens: 43 - rank: 1 max len: 104 min len: 61 avg len: 82.5 num_loss_counted_tokens: 108 | |
total tokens: 124 num samples: 2 num padding tokens: 2 - rank: 4 max len: 62 min len: 60 avg len: 61.0 num_loss_counted_tokens: 65 | |
total tokens: 146 num samples: 2 num padding tokens: 6 - rank: 2 max len: 73 min len: 67 avg len: 70.0 num_loss_counted_tokens: 66 | |
total tokens: 124 num samples: 2 num padding tokens: 13 - rank: 1 max len: 62 min len: 49 avg len: 55.5 num_loss_counted_tokens: 54 | |
total tokens: 132 num samples: 2 num padding tokens: 6 - rank: 0 max len: 66 min len: 60 avg len: 63.0 num_loss_counted_tokens: 69 | |
total tokens: 228 num samples: 2 num padding tokens: 38 - rank: 4 max len: 114 min len: 76 avg len: 95.0 num_loss_counted_tokens: 129 | |
total tokens: 132 num samples: 2 num padding tokens: 1 - rank: 2 max len: 66 min len: 65 avg len: 65.5 num_loss_counted_tokens: 52 | |
total tokens: 138 num samples: 2 num padding tokens: 10 - rank: 2 max len: 69 min len: 59 avg len: 64.0 num_loss_counted_tokens: 60 | |
total tokens: 126 num samples: 2 num padding tokens: 3 - rank: 2 max len: 63 min len: 60 avg len: 61.5 num_loss_counted_tokens: 61 | |
total tokens: 216 num samples: 2 num padding tokens: 49 - rank: 2 max len: 108 min len: 59 avg len: 83.5 num_loss_counted_tokens: 103 | |
total tokens: 142 num samples: 2 num padding tokens: 8 - rank: 2 max len: 71 min len: 63 avg len: 67.0 num_loss_counted_tokens: 64 | |
total tokens: 142 num samples: 2 num padding tokens: 9 - rank: 7 max len: 71 min len: 62 avg len: 66.5 num_loss_counted_tokens: 64 | |
total tokens: 244 num samples: 2 num padding tokens: 78 - rank: 6 max len: 122 min len: 44 avg len: 83.0 num_loss_counted_tokens: 115 | |
total tokens: 214 num samples: 2 num padding tokens: 62 - rank: 5 max len: 107 min len: 45 avg len: 76.0 num_loss_counted_tokens: 99 | |
total tokens: 200 num samples: 2 num padding tokens: 47 - rank: 1 max len: 100 min len: 53 avg len: 76.5 num_loss_counted_tokens: 90 | |
total tokens: 144 num samples: 2 num padding tokens: 14 - rank: 1 max len: 72 min len: 58 avg len: 65.0 num_loss_counted_tokens: 84 | |
total tokens: 128 num samples: 2 num padding tokens: 6 - rank: 1 max len: 64 min len: 58 avg len: 61.0 num_loss_counted_tokens: 69 | |
total tokens: 96 num samples: 2 num padding tokens: 4 - rank: 1 max len: 48 min len: 44 avg len: 46.0 num_loss_counted_tokens: 46 | |
total tokens: 104 num samples: 2 num padding tokens: 4 - rank: 1 max len: 52 min len: 48 avg len: 50.0 num_loss_counted_tokens: 60 | |
total tokens: 166 num samples: 2 num padding tokens: 21 - rank: 5 max len: 83 min len: 62 avg len: 72.5 num_loss_counted_tokens: 73 total tokens: 164 num samples: 2 num padding tokens: 21 - rank: 5 max len: 82 min len: 61 avg len: 71.5 num_loss_counted_tokens: 82 | |
total tokens: 226 num samples: 2 num padding tokens: 6 - rank: 5 max len: 113 min len: 107 avg len: 110.0 num_loss_counted_tokens: 142 | |
total tokens: 148 num samples: 2 num padding tokens: 28 - rank: 5 max len: 74 min len: 46 avg len: 60.0 num_loss_counted_tokens: 63 | |
total tokens: 168 num samples: 2 num padding tokens: 33 - rank: 5 max len: 84 min len: 51 avg len: 67.5 num_loss_counted_tokens: 89 | |
total tokens: 162 num samples: 2 num padding tokens: 29 - rank: 5 max len: 81 min len: 52 avg len: 66.5 num_loss_counted_tokens: 72 | |
total tokens: 186 num samples: 2 num padding tokens: 29 - rank: 1 max len: 93 min len: 64 avg len: 78.5 num_loss_counted_tokens: 76 | |
total tokens: 140 num samples: 2 num padding tokens: 6 - rank: 5 max len: 70 min len: 64 avg len: 67.0 num_loss_counted_tokens: 59 | |
total tokens: 156 num samples: 2 num padding tokens: 6 - rank: 5 max len: 78 min len: 72 avg len: 75.0 num_loss_counted_tokens: 81 | |
total tokens: 174 num samples: 2 num padding tokens: 29 - rank: 1 max len: 87 min len: 58 avg len: 72.5 num_loss_counted_tokens: 84 | |
total tokens: 124 num samples: 2 num padding tokens: 13 - rank: 3 max len: 62 min len: 49 avg len: 55.5 num_loss_counted_tokens: 55 | |
total tokens: 140 num samples: 2 num padding tokens: 15 - rank: 3 max len: 70 min len: 55 avg len: 62.5 num_loss_counted_tokens: 69 | |
total tokens: 132 num samples: 2 num padding tokens: 16 - rank: 3 max len: 66 min len: 50 avg len: 58.0 num_loss_counted_tokens: 61 | |
total tokens: 140 num samples: 2 num padding tokens: 19 - rank: 3 max len: 70 min len: 51 avg len: 60.5 num_loss_counted_tokens: 55 | |
total tokens: 128 num samples: 2 num padding tokens: 3 - rank: 3 max len: 64 min len: 61 avg len: 62.5 num_loss_counted_tokens: 71 | |
total tokens: 104 num samples: 2 num padding tokens: 6 - rank: 5 max len: 52 min len: 46 avg len: 49.0 num_loss_counted_tokens: 52 | |
total tokens: 142 num samples: 2 num padding tokens: 11 - rank: 3 max len: 71 min len: 60 avg len: 65.5 num_loss_counted_tokens: 100 | |
total tokens: 160 num samples: 2 num padding tokens: 14 - rank: 3 max len: 80 min len: 66 avg len: 73.0 num_loss_counted_tokens: 83 | |
total tokens: 130 num samples: 2 num padding tokens: 12 - rank: 3 max len: 65 min len: 53 avg len: 59.0 num_loss_counted_tokens: 64 | |
total tokens: 128 num samples: 2 num padding tokens: 3 - rank: 3 max len: 64 min len: 61 avg len: 62.5 num_loss_counted_tokens: 77 | |
total tokens: 188 num samples: 2 num padding tokens: 26 - rank: 3 max len: 94 min len: 68 avg len: 81.0 num_loss_counted_tokens: 85 | |
total tokens: 188 num samples: 2 num padding tokens: 14 - rank: 2 max len: 94 min len: 80 avg len: 87.0 num_loss_counted_tokens: 116 | |
total tokens: 176 num samples: 2 num padding tokens: 25 - rank: 3 max len: 88 min len: 63 avg len: 75.5 num_loss_counted_tokens: 82 | |
total tokens: 162 num samples: 2 num padding tokens: 26 - rank: 3 max len: 81 min len: 55 avg len: 68.0 num_loss_counted_tokens: 85 | |
Per-token loss scaled by world size: 0.0007222609710879624Per-token loss scaled by world size: 0.0011062632547691464Per-token loss scaled by world size: 0.0014970493502914906Per-token loss scaled by world size: 0.0006512971594929695Per-token loss scaled by world size: 0.005336429923772812Per-token loss scaled by world size: 0.0008554465603083372Per-token loss scaled by world size: 0.002156679518520832 | |
Epoch: 4, Step: 49, Rank: 5, loss = 0.36687955260276794Epoch: 4, Step: 49, Rank: 1, loss = 0.10292214155197144 | |
Epoch: 4, Step: 49, Rank: 4, loss = 0.0760556012392044 | |
Epoch: 4, Step: 49, Rank: 6, loss = 0.14827170968055725Epoch: 4, Step: 49, Rank: 0, loss = 0.04965544119477272 | |
Epoch: 4, Step: 49, Rank: 7, loss = 0.05881195142865181 | |
Epoch: 4, Step: 49, Rank: 2, loss = 0.04477667808532715 | |
Per-token loss scaled by world size: 0.0012891377555206418 | |
Epoch: 4, Step: 49, Rank: 3, loss = 0.08862821757793427 | |
[2024-07-27 20:05:43,480] [INFO] [logging.py:96:log_dist] [Rank 0] step=49, skipped=0, lr=[1.7012367842724887e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:43,557] [INFO] [timer.py:258:stop] epoch=0/micro_step=49/global_step=49, RunningAvgSamplesPerSec=31.626881887830034, CurrSamplesPerSec=30.459502835854497, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 4: 8%|▊ | 1/12 [00:00<00:10, 1.09it/s]{ | |
"epoch": 4, | |
"step": 49, | |
"rank": 0, | |
"loss": 0.04965544119477272, | |
"overall_throughput": 30.35780108269395, | |
"lr": 1.7012367842724887e-05, | |
"cuda_mem_allocated": 21.996244430541992, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 550, | |
"batch_size": 16, | |
"total_loss": 0.11700016260147095, | |
"gradnorm": 1.7901870012283325, | |
"weight_norm": 393.46917724609375, | |
"timestamp": "2024-07-27T20:05:43.608424" | |
} | |
Per-token loss scaled by world size: 0.0014621953014284372Per-token loss scaled by world size: 0.0015464453026652336Per-token loss scaled by world size: 0.0014793629525229335Per-token loss scaled by world size: 0.0010739548597484827Per-token loss scaled by world size: 0.002221300033852458 | |
Per-token loss scaled by world size: 0.001030008657835424 | |
Per-token loss scaled by world size: 0.0031245022546499968 | |
Epoch: 4, Step: 50, Rank: 4, loss = 0.0959736704826355Epoch: 4, Step: 50, Rank: 5, loss = 0.10032563656568527Epoch: 4, Step: 50, Rank: 0, loss = 0.14410683512687683 | |
Epoch: 4, Step: 50, Rank: 1, loss = 0.06682181358337402Epoch: 4, Step: 50, Rank: 3, loss = 0.06967282295227051 | |
Epoch: 4, Step: 50, Rank: 6, loss = 0.09485992044210434 | |
Epoch: 4, Step: 50, Rank: 7, loss = 0.2027020901441574 | |
Per-token loss scaled by world size: 0.0011166962794959545 | |
Epoch: 4, Step: 50, Rank: 2, loss = 0.07244566828012466 | |
[2024-07-27 20:05:44,023] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=0, lr=[1.6772815716257414e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:44,101] [INFO] [timer.py:258:stop] epoch=0/micro_step=50/global_step=50, RunningAvgSamplesPerSec=31.64645160925237, CurrSamplesPerSec=32.594364979528, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 4: 17%|█▋ | 2/12 [00:01<00:06, 1.43it/s]{ | |
"epoch": 4, | |
"step": 50, | |
"rank": 0, | |
"loss": 0.14410683512687683, | |
"overall_throughput": 32.48167592990112, | |
"lr": 1.6772815716257414e-05, | |
"cuda_mem_allocated": 21.999523639678955, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 519, | |
"batch_size": 16, | |
"total_loss": 0.105863556265831, | |
"gradnorm": 2.59075927734375, | |
"weight_norm": 393.4695129394531, | |
"timestamp": "2024-07-27T20:05:44.148907" | |
} | |
Per-token loss scaled by world size: 0.0009595813462510705Per-token loss scaled by world size: 0.0007476079626940191Per-token loss scaled by world size: 0.002177697606384754Per-token loss scaled by world size: 0.00161154440138489Per-token loss scaled by world size: 0.002184153301641345 | |
Per-token loss scaled by world size: 0.0022782967425882816Per-token loss scaled by world size: 0.002697640098631382 | |
Epoch: 4, Step: 51, Rank: 1, loss = 0.12066438794136047Epoch: 4, Step: 51, Rank: 7, loss = 0.16305510699748993Epoch: 4, Step: 51, Rank: 3, loss = 0.055977147072553635 | |
Epoch: 4, Step: 51, Rank: 2, loss = 0.16353848576545715Epoch: 4, Step: 51, Rank: 0, loss = 0.07184865325689316 | |
Epoch: 4, Step: 51, Rank: 6, loss = 0.20198580622673035Epoch: 4, Step: 51, Rank: 5, loss = 0.1705874651670456 | |
Per-token loss scaled by world size: 0.001197479316033423 | |
Epoch: 4, Step: 51, Rank: 4, loss = 0.08966126292943954 | |
[2024-07-27 20:05:44,569] [INFO] [logging.py:96:log_dist] [Rank 0] step=51, skipped=0, lr=[1.6525857615241686e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:44,647] [INFO] [timer.py:258:stop] epoch=0/micro_step=51/global_step=51, RunningAvgSamplesPerSec=31.66021704434534, CurrSamplesPerSec=32.33534113615524, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 4: 25%|██▌ | 3/12 [00:02<00:05, 1.59it/s]{ | |
"epoch": 4, | |
"step": 51, | |
"rank": 0, | |
"loss": 0.07184865325689316, | |
"overall_throughput": 32.279271652634336, | |
"lr": 1.6525857615241686e-05, | |
"cuda_mem_allocated": 22.002862453460693, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 599, | |
"batch_size": 16, | |
"total_loss": 0.12966477870941162, | |
"gradnorm": 2.6400396823883057, | |
"weight_norm": 393.4698486328125, | |
"timestamp": "2024-07-27T20:05:44.689190" | |
} | |
Per-token loss scaled by world size: 0.001371016027405858Per-token loss scaled by world size: 0.0010015949374064803Per-token loss scaled by world size: 0.0019693197682499886 | |
Per-token loss scaled by world size: 0.00044975956552661955 | |
Per-token loss scaled by world size: 0.0015600252663716674Per-token loss scaled by world size: 0.0014032198814675212Per-token loss scaled by world size: 0.0006641225190833211 | |
Epoch: 4, Step: 52, Rank: 4, loss = 0.08450957387685776 | |
Epoch: 4, Step: 52, Rank: 0, loss = 0.11567948013544083 | |
Epoch: 4, Step: 52, Rank: 6, loss = 0.16616135835647583 | |
Epoch: 4, Step: 52, Rank: 5, loss = 0.1316271275281906Epoch: 4, Step: 52, Rank: 2, loss = 0.03794846311211586 | |
Epoch: 4, Step: 52, Rank: 1, loss = 0.11839667707681656 | |
Epoch: 4, Step: 52, Rank: 7, loss = 0.05603533610701561 | |
Per-token loss scaled by world size: 0.0015486030606552958 | |
Epoch: 4, Step: 52, Rank: 3, loss = 0.13066338002681732 | |
[2024-07-27 20:05:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] step=52, skipped=0, lr=[1.6271763584735373e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:45,202] [INFO] [timer.py:258:stop] epoch=0/micro_step=52/global_step=52, RunningAvgSamplesPerSec=31.653990957078555, CurrSamplesPerSec=31.351883784434047, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 4: 33%|███▎ | 4/12 [00:02<00:04, 1.67it/s]{ | |
"epoch": 4, | |
"step": 52, | |
"rank": 0, | |
"loss": 0.11567948013544083, | |
"overall_throughput": 31.298703164249382, | |
"lr": 1.6271763584735373e-05, | |
"cuda_mem_allocated": 22.004770278930664, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 675, | |
"batch_size": 16, | |
"total_loss": 0.10512767732143402, | |
"gradnorm": 2.028604745864868, | |
"weight_norm": 393.47015380859375, | |
"timestamp": "2024-07-27T20:05:45.245073" | |
} | |
Per-token loss scaled by world size: 0.0023022103123366833Per-token loss scaled by world size: 0.002660317113623023Per-token loss scaled by world size: 0.0015502030728384852Per-token loss scaled by world size: 0.001655052648857236Per-token loss scaled by world size: 0.0008553997613489628 | |
Per-token loss scaled by world size: 0.002113129710778594Per-token loss scaled by world size: 0.0022639036178588867 | |
Epoch: 4, Step: 53, Rank: 6, loss = 0.19387060403823853 | |
Epoch: 4, Step: 53, Rank: 0, loss = 0.16777357459068298 | |
Epoch: 4, Step: 53, Rank: 1, loss = 0.06233725696802139Epoch: 4, Step: 53, Rank: 2, loss = 0.11297105252742767Epoch: 4, Step: 53, Rank: 5, loss = 0.12061195820569992Epoch: 4, Step: 53, Rank: 3, loss = 0.16498197615146637 | |
Epoch: 4, Step: 53, Rank: 4, loss = 0.15399432182312012 | |
Per-token loss scaled by world size: 0.0017511562909930944 | |
Epoch: 4, Step: 53, Rank: 7, loss = 0.12761551141738892 | |
[2024-07-27 20:05:45,671] [INFO] [logging.py:96:log_dist] [Rank 0] step=53, skipped=0, lr=[1.6010811472830253e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:45,749] [INFO] [timer.py:258:stop] epoch=0/micro_step=53/global_step=53, RunningAvgSamplesPerSec=31.66602722465284, CurrSamplesPerSec=32.279737447919125, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
Epoch 4: 42%|████▏ | 5/12 [00:03<00:04, 1.72it/s]{ | |
"epoch": 4, | |
"step": 53, | |
"rank": 0, | |
"loss": 0.16777357459068298, | |
"overall_throughput": 32.227156557192046, | |
"lr": 1.6010811472830253e-05, | |
"cuda_mem_allocated": 22.00548553466797, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 583, | |
"batch_size": 16, | |
"total_loss": 0.13801953196525574, | |
"gradnorm": 2.3451616764068604, | |
"weight_norm": 393.4704895019531, | |
"timestamp": "2024-07-27T20:05:45.792018" | |
} | |
Per-token loss scaled by world size: 0.0016146524576470256Per-token loss scaled by world size: 0.00018823673599399626Per-token loss scaled by world size: 0.00015145067300181836Per-token loss scaled by world size: 0.002239079447463155 | |
Per-token loss scaled by world size: 0.0013640215620398521Per-token loss scaled by world size: 0.0009340652031823993Per-token loss scaled by world size: 0.002048594644293189 | |
Epoch: 4, Step: 54, Rank: 4, loss = 0.017576605081558228 | |
Epoch: 4, Step: 54, Rank: 0, loss = 0.15076817572116852 | |
Epoch: 4, Step: 54, Rank: 6, loss = 0.12736551463603973Epoch: 4, Step: 54, Rank: 5, loss = 0.20907405018806458 | |
Epoch: 4, Step: 54, Rank: 2, loss = 0.014141706749796867 | |
Epoch: 4, Step: 54, Rank: 7, loss = 0.08721833676099777 | |
Epoch: 4, Step: 54, Rank: 3, loss = 0.19128753244876862 | |
Per-token loss scaled by world size: 0.001376173458993435 | |
Epoch: 4, Step: 54, Rank: 1, loss = 0.12850019335746765 | |
[2024-07-27 20:05:46,212] [INFO] [logging.py:96:log_dist] [Rank 0] step=54, skipped=0, lr=[1.5743286626829437e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:46,290] [INFO] [timer.py:258:stop] epoch=0/micro_step=54/global_step=54, RunningAvgSamplesPerSec=31.675406097739614, CurrSamplesPerSec=32.16120844994824, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 4: 50%|█████ | 6/12 [00:03<00:03, 1.76it/s]{ | |
"epoch": 4, | |
"step": 54, | |
"rank": 0, | |
"loss": 0.15076817572116852, | |
"overall_throughput": 32.07902156132976, | |
"lr": 1.5743286626829437e-05, | |
"cuda_mem_allocated": 22.004770278930664, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 747, | |
"batch_size": 16, | |
"total_loss": 0.11574152112007141, | |
"gradnorm": 1.6529176235198975, | |
"weight_norm": 393.4708251953125, | |
"timestamp": "2024-07-27T20:05:46.332858" | |
} | |
Per-token loss scaled by world size: 0.0008495299844071269Per-token loss scaled by world size: 0.002507910830900073Per-token loss scaled by world size: 0.0028947019018232822Per-token loss scaled by world size: 0.001476020785048604Per-token loss scaled by world size: 0.001191351911984384Per-token loss scaled by world size: 0.0018832029309123755 | |
Per-token loss scaled by world size: 0.0013536742189899087 | |
Epoch: 4, Step: 55, Rank: 1, loss = 0.22397755086421967Epoch: 4, Step: 55, Rank: 6, loss = 0.19404959678649902Epoch: 4, Step: 55, Rank: 7, loss = 0.09218085557222366 | |
Epoch: 4, Step: 55, Rank: 0, loss = 0.06573238223791122Epoch: 4, Step: 55, Rank: 2, loss = 0.14571282267570496Epoch: 4, Step: 55, Rank: 3, loss = 0.11420710384845734 | |
Epoch: 4, Step: 55, Rank: 4, loss = 0.10474054515361786 | |
Per-token loss scaled by world size: 0.0010037233587354422 | |
Epoch: 4, Step: 55, Rank: 5, loss = 0.07766309380531311 | |
[2024-07-27 20:05:46,748] [INFO] [logging.py:96:log_dist] [Rank 0] step=55, skipped=0, lr=[1.5469481581224274e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:05:46,825] [INFO] [timer.py:258:stop] epoch=0/micro_step=55/global_step=55, RunningAvgSamplesPerSec=31.69138798284937, CurrSamplesPerSec=32.545268319935445, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Saving model in huggingface format at samples_seen: 880 | |
{ | |
"epoch": 4, | |
"step": 55, | |
"rank": 0, | |
"loss": 0.06573238223791122, | |
"overall_throughput": 32.46340198605274, | |
"lr": 1.5469481581224274e-05, | |
"cuda_mem_allocated": 22.000000476837158, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 619, | |
"batch_size": 16, | |
"total_loss": 0.1272830069065094, | |
"gradnorm": 1.8899047374725342, | |
"weight_norm": 393.4711608886719, | |
"timestamp": "2024-07-27T20:05:46.828726" | |
} | |
Model saved in /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_880 | |
[20:06:04] INFO saving took 17.93557572364807 seconds utils.py:611 | |
Epoch 4: 58%|█████▊ | 7/12 [00:22<00:32, 6.42s/it]Per-token loss scaled by world size: 0.0008630760130472481Per-token loss scaled by world size: 0.0010983350221067667Per-token loss scaled by world size: 0.0021769509185105562 | |
Per-token loss scaled by world size: 0.0004714219248853624 | |
Per-token loss scaled by world size: 0.0017523688729852438 | |
Per-token loss scaled by world size: 0.0024742181412875652 | |
Epoch: 4, Step: 56, Rank: 0, loss = 0.042015478014945984Epoch: 4, Step: 56, Rank: 4, loss = 0.09788911044597626Per-token loss scaled by world size: 0.00015522913599852473 | |
Epoch: 4, Step: 56, Rank: 1, loss = 0.2205146849155426 | |
Epoch: 4, Step: 56, Rank: 2, loss = 0.19402074813842773Epoch: 4, Step: 56, Rank: 7, loss = 0.07692164927721024 | |
Epoch: 4, Step: 56, Rank: 3, loss = 0.15617987513542175 | |
Epoch: 4, Step: 56, Rank: 5, loss = 0.013834796845912933 | |
Per-token loss scaled by world size: 0.0007872319547459483 | |
Epoch: 4, Step: 56, Rank: 6, loss = 0.07016205042600632 | |
[2024-07-27 20:06:05,226] [INFO] [logging.py:96:log_dist] [Rank 0] step=56, skipped=0, lr=[1.5189695737812153e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:05,304] [INFO] [timer.py:258:stop] epoch=0/micro_step=56/global_step=56, RunningAvgSamplesPerSec=31.708933325115336, CurrSamplesPerSec=32.66747732319786, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 4: 67%|██████▋ | 8/12 [00:22<00:18, 4.55s/it]{ | |
"epoch": 4, | |
"step": 56, | |
"rank": 0, | |
"loss": 0.042015478014945984, | |
"overall_throughput": 32.5988932413107, | |
"lr": 1.5189695737812153e-05, | |
"cuda_mem_allocated": 21.999046802520752, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 713, | |
"batch_size": 16, | |
"total_loss": 0.10894230008125305, | |
"gradnorm": 1.9275243282318115, | |
"weight_norm": 393.47149658203125, | |
"timestamp": "2024-07-27T20:06:05.346935" | |
} | |
Per-token loss scaled by world size: 0.0021620304323732853Per-token loss scaled by world size: 0.0008936038357205689Per-token loss scaled by world size: 0.0009625108214095235Per-token loss scaled by world size: 0.0029216075781732798Per-token loss scaled by world size: 0.0011535611702129245 | |
Per-token loss scaled by world size: 0.0010705487802624702 | |
Per-token loss scaled by world size: 0.001004268298856914 | |
Epoch: 4, Step: 57, Rank: 0, loss = 0.060765061527490616Epoch: 4, Step: 57, Rank: 1, loss = 0.07844215631484985 | |
Epoch: 4, Step: 57, Rank: 4, loss = 0.14701807498931885Epoch: 4, Step: 57, Rank: 6, loss = 0.06545073539018631 | |
Epoch: 4, Step: 57, Rank: 5, loss = 0.07279732078313828Epoch: 4, Step: 57, Rank: 2, loss = 0.19866931438446045 | |
Epoch: 4, Step: 57, Rank: 7, loss = 0.06829024106264114 | |
Per-token loss scaled by world size: 0.004088845103979111 | |
Epoch: 4, Step: 57, Rank: 3, loss = 0.27804145216941833 | |
[2024-07-27 20:06:05,760] [INFO] [logging.py:96:log_dist] [Rank 0] step=57, skipped=0, lr=[1.4904235038305084e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:05,837] [INFO] [timer.py:258:stop] epoch=0/micro_step=57/global_step=57, RunningAvgSamplesPerSec=31.726304642319306, CurrSamplesPerSec=32.69348184898873, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 4: 75%|███████▌ | 9/12 [00:23<00:09, 3.29s/it]{ | |
"epoch": 4, | |
"step": 57, | |
"rank": 0, | |
"loss": 0.060765061527490616, | |
"overall_throughput": 32.6126440646996, | |
"lr": 1.4904235038305084e-05, | |
"cuda_mem_allocated": 21.999046802520752, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 544, | |
"batch_size": 16, | |
"total_loss": 0.12118428945541382, | |
"gradnorm": 1.7047128677368164, | |
"weight_norm": 393.4718322753906, | |
"timestamp": "2024-07-27T20:06:05.881923" | |
} | |
Per-token loss scaled by world size: 0.0010402144398540258Per-token loss scaled by world size: 0.001066502882167697Per-token loss scaled by world size: 0.0007541680242866278 | |
Per-token loss scaled by world size: 0.001964687602594495Per-token loss scaled by world size: 0.00040768564213067293Per-token loss scaled by world size: 0.002232564380392432 | |
Per-token loss scaled by world size: 0.004068903159350157 | |
Epoch: 4, Step: 58, Rank: 7, loss = 0.08758654445409775 | |
Epoch: 4, Step: 58, Rank: 1, loss = 0.06193605065345764Epoch: 4, Step: 58, Rank: 0, loss = 0.08542761206626892 | |
Epoch: 4, Step: 58, Rank: 3, loss = 0.16134996712207794 | |
Epoch: 4, Step: 58, Rank: 5, loss = 0.33415865898132324Epoch: 4, Step: 58, Rank: 4, loss = 0.1833493560552597 | |
Epoch: 4, Step: 58, Rank: 2, loss = 0.033481184393167496 | |
Per-token loss scaled by world size: 0.000749716826248914 | |
Epoch: 4, Step: 58, Rank: 6, loss = 0.06157049536705017 | |
[2024-07-27 20:06:06,308] [INFO] [logging.py:96:log_dist] [Rank 0] step=58, skipped=0, lr=[1.461341162978688e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:06,388] [INFO] [timer.py:258:stop] epoch=0/micro_step=58/global_step=58, RunningAvgSamplesPerSec=31.725638447536326, CurrSamplesPerSec=31.68904077052279, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 4: 83%|████████▎ | 10/12 [00:23<00:04, 2.45s/it]{ | |
"epoch": 4, | |
"step": 58, | |
"rank": 0, | |
"loss": 0.08542761206626892, | |
"overall_throughput": 31.617913823118545, | |
"lr": 1.461341162978688e-05, | |
"cuda_mem_allocated": 22.002624034881592, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 657, | |
"batch_size": 16, | |
"total_loss": 0.12610748410224915, | |
"gradnorm": 2.7373974323272705, | |
"weight_norm": 393.47216796875, | |
"timestamp": "2024-07-27T20:06:06.428516" | |
} | |
Per-token loss scaled by world size: 0.0009614942828193307Per-token loss scaled by world size: 0.0012739634839817882Per-token loss scaled by world size: 0.0009918607538565993Per-token loss scaled by world size: 0.001772751216776669 | |
Per-token loss scaled by world size: 0.0035334480926394463 | |
Per-token loss scaled by world size: 0.0007813825504854321Per-token loss scaled by world size: 0.00042521810973994434 | |
Epoch: 4, Step: 59, Rank: 0, loss = 0.07290176302194595Epoch: 4, Step: 59, Rank: 6, loss = 0.09363631904125214 | |
Epoch: 4, Step: 59, Rank: 4, loss = 0.1302972137928009Epoch: 4, Step: 59, Rank: 7, loss = 0.07066982984542847 | |
Epoch: 4, Step: 59, Rank: 5, loss = 0.259708434343338 | |
Epoch: 4, Step: 59, Rank: 3, loss = 0.05743161588907242 | |
Epoch: 4, Step: 59, Rank: 2, loss = 0.03125353157520294 | |
Per-token loss scaled by world size: 0.0010259401751682162 | |
Epoch: 4, Step: 59, Rank: 1, loss = 0.07540660351514816 | |
[2024-07-27 20:06:06,837] [INFO] [logging.py:96:log_dist] [Rank 0] step=59, skipped=0, lr=[1.4317543523384928e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:06,915] [INFO] [timer.py:258:stop] epoch=0/micro_step=59/global_step=59, RunningAvgSamplesPerSec=31.746165669393985, CurrSamplesPerSec=32.93967877502177, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 4: 92%|█████████▏| 11/12 [00:24<00:01, 1.86s/it]{ | |
"epoch": 4, | |
"step": 59, | |
"rank": 0, | |
"loss": 0.07290176302194595, | |
"overall_throughput": 32.85643000105753, | |
"lr": 1.4317543523384928e-05, | |
"cuda_mem_allocated": 21.999285221099854, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 588, | |
"batch_size": 16, | |
"total_loss": 0.09891317039728165, | |
"gradnorm": 1.6408778429031372, | |
"weight_norm": 393.47247314453125, | |
"timestamp": "2024-07-27T20:06:06.958893" | |
} | |
Per-token loss scaled by world size: 0.0008049748139455914Per-token loss scaled by world size: 0.0006352822529152036Per-token loss scaled by world size: 0.004135269671678543Per-token loss scaled by world size: 0.0009236105252057314 | |
Per-token loss scaled by world size: 0.0006417850963771343Per-token loss scaled by world size: 0.0002449562889523804Per-token loss scaled by world size: 0.002001277171075344 | |
Epoch: 4, Step: 60, Rank: 3, loss = 0.04661383479833603Epoch: 4, Step: 60, Rank: 4, loss = 0.05906502529978752Epoch: 4, Step: 60, Rank: 1, loss = 0.047090981155633926Epoch: 4, Step: 60, Rank: 7, loss = 0.06776992231607437 | |
Epoch: 4, Step: 60, Rank: 5, loss = 0.01797366701066494 | |
Epoch: 4, Step: 60, Rank: 0, loss = 0.3034254014492035 | |
Epoch: 4, Step: 60, Rank: 6, loss = 0.14684371650218964 | |
Per-token loss scaled by world size: 0.0006847438053227961 | |
Epoch: 4, Step: 60, Rank: 2, loss = 0.0502430759370327 | |
[2024-07-27 20:06:07,374] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=0, lr=[1.4016954246529697e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:07,452] [INFO] [timer.py:258:stop] epoch=0/micro_step=60/global_step=60, RunningAvgSamplesPerSec=31.759205664862847, CurrSamplesPerSec=32.52061781981693, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 4: 100%|██████████| 12/12 [00:24<00:00, 1.46s/it]{ | |
"epoch": 4, | |
"step": 60, | |
"rank": 0, | |
"loss": 0.3034254014492035, | |
"overall_throughput": 32.436209882408896, | |
"lr": 1.4016954246529697e-05, | |
"cuda_mem_allocated": 22.001431465148926, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 587, | |
"batch_size": 16, | |
"total_loss": 0.09237820655107498, | |
"gradnorm": 1.7326298952102661, | |
"weight_norm": 393.4727478027344, | |
"timestamp": "2024-07-27T20:06:07.493757" | |
} | |
Epoch 4: 100%|██████████| 12/12 [00:24<00:00, 2.08s/it] | |
total tokens: 122 num samples: 2 num padding tokens: 15 - rank: 1 max len: 61 min len: 46 avg len: 53.5 num_loss_counted_tokens: 55 | |
total tokens: 144 num samples: 2 num padding tokens: 12 - rank: 4 max len: 72 min len: 60 avg len: 66.0 num_loss_counted_tokens: 69 | |
total tokens: 132 num samples: 2 num padding tokens: 3 - rank: 7 max len: 66 min len: 63 avg len: 64.5 num_loss_counted_tokens: 61 | |
total tokens: 152 num samples: 2 num padding tokens: 11 - rank: 1 max len: 76 min len: 65 avg len: 70.5 num_loss_counted_tokens: 66 | |
total tokens: 200 num samples: 2 num padding tokens: 54 - rank: 7 max len: 100 min len: 46 avg len: 73.0 num_loss_counted_tokens: 89 | |
total tokens: 168 num samples: 2 num padding tokens: 11 - rank: 7 max len: 84 min len: 73 avg len: 78.5 num_loss_counted_tokens: 96 | |
total tokens: 90 num samples: 2 num padding tokens: 2 - rank: 7 max len: 45 min len: 43 avg len: 44.0 num_loss_counted_tokens: 38 | |
total tokens: 154 num samples: 2 num padding tokens: 11 - rank: 4 max len: 77 min len: 66 avg len: 71.5 num_loss_counted_tokens: 80 | |
total tokens: 144 num samples: 2 num padding tokens: 14 - rank: 7 max len: 72 min len: 58 avg len: 65.0 num_loss_counted_tokens: 84 | |
total tokens: 148 num samples: 2 num padding tokens: 25 - rank: 7 max len: 74 min len: 49 avg len: 61.5 num_loss_counted_tokens: 64 | |
total tokens: 138 num samples: 2 num padding tokens: 15 - rank: 7 max len: 69 min len: 54 avg len: 61.5 num_loss_counted_tokens: 79 | |
total tokens: 160 num samples: 2 num padding tokens: 10 - rank: 1 max len: 80 min len: 70 avg len: 75.0 num_loss_counted_tokens: 94 | |
total tokens: 134 num samples: 2 num padding tokens: 7 - rank: 7 max len: 67 min len: 60 avg len: 63.5 num_loss_counted_tokens: 75 | |
total tokens: 140 num samples: 2 num padding tokens: 17 - rank: 1 max len: 70 min len: 53 avg len: 61.5 num_loss_counted_tokens: 68 | |
total tokens: 136 num samples: 2 num padding tokens: 13 - rank: 7 max len: 68 min len: 55 avg len: 61.5 num_loss_counted_tokens: 51 | |
total tokens: 166 num samples: 2 num padding tokens: 14 - rank: 7 max len: 83 min len: 69 avg len: 76.0 num_loss_counted_tokens: 85 | |
total tokens: 156 num samples: 2 num padding tokens: 14 - rank: 7 max len: 78 min len: 64 avg len: 71.0 num_loss_counted_tokens: 86 | |
total tokens: 166 num samples: 2 num padding tokens: 21 - rank: 4 max len: 83 min len: 62 avg len: 72.5 num_loss_counted_tokens: 65 | |
total tokens: 158 num samples: 2 num padding tokens: 19 - rank: 4 max len: 79 min len: 60 avg len: 69.5 num_loss_counted_tokens: 69 | |
total tokens: 110 num samples: 2 num padding tokens: 10 - rank: 1 max len: 55 min len: 45 avg len: 50.0 num_loss_counted_tokens: 54 | |
total tokens: 132 num samples: 2 num padding tokens: 2 - rank: 4 max len: 66 min len: 64 avg len: 65.0 num_loss_counted_tokens: 75 | |
total tokens: 118 num samples: 2 num padding tokens: 7 - rank: 4 max len: 59 min len: 52 avg len: 55.5 num_loss_counted_tokens: 60 | |
total tokens: 162 num samples: 2 num padding tokens: 21 - rank: 1 max len: 81 min len: 60 avg len: 70.5 num_loss_counted_tokens: 75 | |
total tokens: 142 num samples: 2 num padding tokens: 9 - rank: 1 max len: 71 min len: 62 avg len: 66.5 num_loss_counted_tokens: 58 | |
total tokens: 134 num samples: 2 num padding tokens: 7 - rank: 4 max len: 67 min len: 60 avg len: 63.5 num_loss_counted_tokens: 55 | |
total tokens: 128 num samples: 2 num padding tokens: 16 - rank: 4 max len: 64 min len: 48 avg len: 56.0 num_loss_counted_tokens: 49 | |
total tokens: 110 num samples: 2 num padding tokens: 4 - rank: 4 max len: 55 min len: 51 avg len: 53.0 num_loss_counted_tokens: 62 | |
total tokens: 140 num samples: 2 num padding tokens: 7 - rank: 4 max len: 70 min len: 63 avg len: 66.5 num_loss_counted_tokens: 58 | |
total tokens: 162 num samples: 2 num padding tokens: 24 - rank: 4 max len: 81 min len: 57 avg len: 69.0 num_loss_counted_tokens: 87 | |
total tokens: 136 num samples: 2 num padding tokens: 7 - rank: 5 max len: 68 min len: 61 avg len: 64.5 num_loss_counted_tokens: 60 | |
total tokens: 174 num samples: 2 num padding tokens: 20 - rank: 4 max len: 87 min len: 67 avg len: 77.0 num_loss_counted_tokens: 73 | |
total tokens: 158 num samples: 2 num padding tokens: 13 - rank: 7 max len: 79 min len: 66 avg len: 72.5 num_loss_counted_tokens: 72 | |
total tokens: 180 num samples: 2 num padding tokens: 32 - rank: 0 max len: 90 min len: 58 avg len: 74.0 num_loss_counted_tokens: 118 | |
total tokens: 152 num samples: 2 num padding tokens: 24 - rank: 0 max len: 76 min len: 52 avg len: 64.0 num_loss_counted_tokens: 71 | |
total tokens: 172 num samples: 2 num padding tokens: 32 - rank: 2 max len: 86 min len: 54 avg len: 70.0 num_loss_counted_tokens: 61 | |
total tokens: 172 num samples: 2 num padding tokens: 42 - rank: 2 max len: 86 min len: 44 avg len: 65.0 num_loss_counted_tokens: 70 | |
total tokens: 188 num samples: 2 num padding tokens: 42 - rank: 0 max len: 94 min len: 52 avg len: 73.0 num_loss_counted_tokens: 87 | |
total tokens: 124 num samples: 2 num padding tokens: 10 - rank: 5 max len: 62 min len: 52 avg len: 57.0 num_loss_counted_tokens: 62 | |
total tokens: 214 num samples: 2 num padding tokens: 47 - rank: 5 max len: 107 min len: 60 avg len: 83.5 num_loss_counted_tokens: 128 | |
total tokens: 214 num samples: 2 num padding tokens: 59 - rank: 5 max len: 107 min len: 48 avg len: 77.5 num_loss_counted_tokens: 106 | |
total tokens: 208 num samples: 2 num padding tokens: 58 - rank: 5 max len: 104 min len: 46 avg len: 75.0 num_loss_counted_tokens: 99 | |
total tokens: 186 num samples: 2 num padding tokens: 43 - rank: 1 max len: 93 min len: 50 avg len: 71.5 num_loss_counted_tokens: 79 | |
total tokens: 120 num samples: 2 num padding tokens: 1 - rank: 1 max len: 60 min len: 59 avg len: 59.5 num_loss_counted_tokens: 64 | |
total tokens: 116 num samples: 2 num padding tokens: 8 - rank: 5 max len: 58 min len: 50 avg len: 54.0 num_loss_counted_tokens: 58 | |
total tokens: 164 num samples: 2 num padding tokens: 24 - rank: 1 max len: 82 min len: 58 avg len: 70.0 num_loss_counted_tokens: 95 | |
total tokens: 180 num samples: 2 num padding tokens: 15 - rank: 1 max len: 90 min len: 75 avg len: 82.5 num_loss_counted_tokens: 107 | |
total tokens: 118 num samples: 2 num padding tokens: 4 - rank: 2 max len: 59 min len: 55 avg len: 57.0 num_loss_counted_tokens: 71 | |
total tokens: 132 num samples: 2 num padding tokens: 17 - rank: 2 max len: 66 min len: 49 avg len: 57.5 num_loss_counted_tokens: 61 | |
total tokens: 140 num samples: 2 num padding tokens: 12 - rank: 2 max len: 70 min len: 58 avg len: 64.0 num_loss_counted_tokens: 78 | |
total tokens: 142 num samples: 2 num padding tokens: 12 - rank: 2 max len: 71 min len: 59 avg len: 65.0 num_loss_counted_tokens: 61 | |
total tokens: 160 num samples: 2 num padding tokens: 22 - rank: 5 max len: 80 min len: 58 avg len: 69.0 num_loss_counted_tokens: 75 | |
total tokens: 228 num samples: 2 num padding tokens: 54 - rank: 5 max len: 114 min len: 60 avg len: 87.0 num_loss_counted_tokens: 122 | |
total tokens: 118 num samples: 2 num padding tokens: 10 - rank: 5 max len: 59 min len: 49 avg len: 54.0 num_loss_counted_tokens: 54 | |
total tokens: 140 num samples: 2 num padding tokens: 18 - rank: 5 max len: 70 min len: 52 avg len: 61.0 num_loss_counted_tokens: 64 | |
total tokens: 130 num samples: 2 num padding tokens: 10 - rank: 0 max len: 65 min len: 55 avg len: 60.0 num_loss_counted_tokens: 63 | |
total tokens: 188 num samples: 2 num padding tokens: 46 - rank: 0 max len: 94 min len: 48 avg len: 71.0 num_loss_counted_tokens: 85 | |
total tokens: 186 num samples: 2 num padding tokens: 33 - rank: 5 max len: 93 min len: 60 avg len: 76.5 num_loss_counted_tokens: 99 | |
total tokens: 126 num samples: 2 num padding tokens: 8 - rank: 0 max len: 63 min len: 55 avg len: 59.0 num_loss_counted_tokens: 52 | |
total tokens: 138 num samples: 2 num padding tokens: 24 - rank: 0 max len: 69 min len: 45 avg len: 57.0 num_loss_counted_tokens: 56 | |
total tokens: 136 num samples: 2 num padding tokens: 18 - rank: 0 max len: 68 min len: 50 avg len: 59.0 num_loss_counted_tokens: 61 | |
total tokens: 216 num samples: 2 num padding tokens: 45 - rank: 0 max len: 108 min len: 63 avg len: 85.5 num_loss_counted_tokens: 104 | |
total tokens: 122 num samples: 2 num padding tokens: 4 - rank: 3 max len: 61 min len: 57 avg len: 59.0 num_loss_counted_tokens: 56 | |
total tokens: 180 num samples: 2 num padding tokens: 24 - rank: 0 max len: 90 min len: 66 avg len: 78.0 num_loss_counted_tokens: 97 | |
total tokens: 174 num samples: 2 num padding tokens: 17 - rank: 3 max len: 87 min len: 70 avg len: 78.5 num_loss_counted_tokens: 80 | |
total tokens: 282 num samples: 2 num padding tokens: 48 - rank: 3 max len: 141 min len: 93 avg len: 117.0 num_loss_counted_tokens: 204 | |
total tokens: 134 num samples: 2 num padding tokens: 16 - rank: 6 max len: 67 min len: 51 avg len: 59.0 num_loss_counted_tokens: 67 | |
total tokens: 168 num samples: 2 num padding tokens: 24 - rank: 3 max len: 84 min len: 60 avg len: 72.0 num_loss_counted_tokens: 95 | |
total tokens: 226 num samples: 2 num padding tokens: 32 - rank: 0 max len: 113 min len: 81 avg len: 97.0 num_loss_counted_tokens: 119 | |
total tokens: 174 num samples: 2 num padding tokens: 32 - rank: 3 max len: 87 min len: 55 avg len: 71.0 num_loss_counted_tokens: 83 | |
total tokens: 172 num samples: 2 num padding tokens: 4 - rank: 3 max len: 86 min len: 82 avg len: 84.0 num_loss_counted_tokens: 81 | |
total tokens: 122 num samples: 2 num padding tokens: 8 - rank: 3 max len: 61 min len: 53 avg len: 57.0 num_loss_counted_tokens: 54 | |
total tokens: 136 num samples: 2 num padding tokens: 2 - rank: 3 max len: 68 min len: 66 avg len: 67.0 num_loss_counted_tokens: 63 | |
total tokens: 122 num samples: 2 num padding tokens: 8 - rank: 3 max len: 61 min len: 53 avg len: 57.0 num_loss_counted_tokens: 53 | |
total tokens: 116 num samples: 2 num padding tokens: 1 - rank: 6 max len: 58 min len: 57 avg len: 57.5 num_loss_counted_tokens: 67 | |
total tokens: 154 num samples: 2 num padding tokens: 4 - rank: 6 max len: 77 min len: 73 avg len: 75.0 num_loss_counted_tokens: 92 | |
total tokens: 194 num samples: 2 num padding tokens: 45 - rank: 2 max len: 97 min len: 52 avg len: 74.5 num_loss_counted_tokens: 88 | |
total tokens: 152 num samples: 2 num padding tokens: 5 - rank: 2 max len: 76 min len: 71 avg len: 73.5 num_loss_counted_tokens: 79 | |
total tokens: 122 num samples: 2 num padding tokens: 6 - rank: 3 max len: 61 min len: 55 avg len: 58.0 num_loss_counted_tokens: 62 | |
total tokens: 196 num samples: 2 num padding tokens: 35 - rank: 2 max len: 98 min len: 63 avg len: 80.5 num_loss_counted_tokens: 104 | |
total tokens: 184 num samples: 2 num padding tokens: 37 - rank: 6 max len: 92 min len: 55 avg len: 73.5 num_loss_counted_tokens: 90 | |
total tokens: 148 num samples: 2 num padding tokens: 13 - rank: 2 max len: 74 min len: 61 avg len: 67.5 num_loss_counted_tokens: 69 | |
total tokens: 202 num samples: 2 num padding tokens: 50 - rank: 5 max len: 101 min len: 51 avg len: 76.0 num_loss_counted_tokens: 96 | |
total tokens: 126 num samples: 2 num padding tokens: 1 - rank: 2 max len: 63 min len: 62 avg len: 62.5 num_loss_counted_tokens: 53 | |
total tokens: 128 num samples: 2 num padding tokens: 14 - rank: 6 max len: 64 min len: 50 avg len: 57.0 num_loss_counted_tokens: 58 | |
total tokens: 128 num samples: 2 num padding tokens: 6 - rank: 6 max len: 64 min len: 58 avg len: 61.0 num_loss_counted_tokens: 55 | |
total tokens: 176 num samples: 2 num padding tokens: 17 - rank: 3 max len: 88 min len: 71 avg len: 79.5 num_loss_counted_tokens: 98 | |
total tokens: 102 num samples: 2 num padding tokens: 7 - rank: 6 max len: 51 min len: 44 avg len: 47.5 num_loss_counted_tokens: 51 | |
total tokens: 128 num samples: 2 num padding tokens: 9 - rank: 6 max len: 64 min len: 55 avg len: 59.5 num_loss_counted_tokens: 68 | |
total tokens: 128 num samples: 2 num padding tokens: 2 - rank: 0 max len: 64 min len: 62 avg len: 63.0 num_loss_counted_tokens: 80 | |
total tokens: 134 num samples: 2 num padding tokens: 22 - rank: 6 max len: 67 min len: 45 avg len: 56.0 num_loss_counted_tokens: 57 | |
total tokens: 166 num samples: 2 num padding tokens: 23 - rank: 6 max len: 83 min len: 60 avg len: 71.5 num_loss_counted_tokens: 90 | |
total tokens: 142 num samples: 2 num padding tokens: 3 - rank: 6 max len: 71 min len: 68 avg len: 69.5 num_loss_counted_tokens: 70 | |
total tokens: 126 num samples: 2 num padding tokens: 1 - rank: 1 max len: 63 min len: 62 avg len: 62.5 num_loss_counted_tokens: 60 | |
total tokens: 146 num samples: 2 num padding tokens: 29 - rank: 2 max len: 73 min len: 44 avg len: 58.5 num_loss_counted_tokens: 61 | |
total tokens: 152 num samples: 2 num padding tokens: 22 - rank: 3 max len: 76 min len: 54 avg len: 65.0 num_loss_counted_tokens: 86 | |
total tokens: 244 num samples: 2 num padding tokens: 36 - rank: 6 max len: 122 min len: 86 avg len: 104.0 num_loss_counted_tokens: 139 | |
Per-token loss scaled by world size: 0.0025800205767154694Per-token loss scaled by world size: 0.0006805358571000397Per-token loss scaled by world size: 0.0009809250477701426Per-token loss scaled by world size: 0.0011542694410309196 | |
Per-token loss scaled by world size: 0.0011356660397723317 | |
Per-token loss scaled by world size: 8.372703450731933e-05 | |
Per-token loss scaled by world size: 0.0002341267536394298 | |
Epoch: 5, Step: 61, Rank: 1, loss = 0.07038137316703796 | |
Epoch: 5, Step: 61, Rank: 5, loss = 0.08281882852315903Epoch: 5, Step: 61, Rank: 3, loss = 0.1851164698600769 | |
Epoch: 5, Step: 61, Rank: 2, loss = 0.048828449100255966 | |
Epoch: 5, Step: 61, Rank: 0, loss = 0.08148403465747833 | |
Epoch: 5, Step: 61, Rank: 7, loss = 0.006007414776831865 | |
Epoch: 5, Step: 61, Rank: 4, loss = 0.0167985949665308 | |
Per-token loss scaled by world size: 0.0015900362050160766 | |
Epoch: 5, Step: 61, Rank: 6, loss = 0.11408510059118271 | |
[2024-07-27 20:06:08,407] [INFO] [logging.py:96:log_dist] [Rank 0] step=61, skipped=0, lr=[1.3711972489182208e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:08,486] [INFO] [timer.py:258:stop] epoch=0/micro_step=61/global_step=61, RunningAvgSamplesPerSec=31.720545490554542, CurrSamplesPerSec=29.62867677087431, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5, | 1/12 [00:00<00:10, 1.06it/s] | |
"step": 61, | |
"rank": 0, | |
"loss": 0.08148403465747833, | |
"overall_throughput": 29.523344117535782, | |
"lr": 1.3711972489182208e-05, | |
"cuda_mem_allocated": 22.004770278930664, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 574, | |
"batch_size": 16, | |
"total_loss": 0.07569002360105515, | |
"gradnorm": 1.5541268587112427, | |
"weight_norm": 393.4730224609375, | |
"timestamp": "2024-07-27T20:06:08.490189" | |
} | |
Per-token loss scaled by world size: 0.0006772524793632329Per-token loss scaled by world size: 0.0011761346831917763Per-token loss scaled by world size: 0.0012851222418248653Per-token loss scaled by world size: 0.0015470795333385468 | |
Per-token loss scaled by world size: 0.001160036656074226 | |
Per-token loss scaled by world size: 0.0007557208882644773Per-token loss scaled by world size: 0.0015825566370040178 | |
Epoch: 5, Step: 62, Rank: 1, loss = 0.0887981727719307Epoch: 5, Step: 62, Rank: 4, loss = 0.09702672809362411 | |
Epoch: 5, Step: 62, Rank: 2, loss = 0.11680450290441513Epoch: 5, Step: 62, Rank: 0, loss = 0.05113256350159645 | |
Epoch: 5, Step: 62, Rank: 7, loss = 0.05705692619085312 | |
Epoch: 5, Step: 62, Rank: 5, loss = 0.11948302388191223 | |
Epoch: 5, Step: 62, Rank: 6, loss = 0.08758276700973511 | |
Per-token loss scaled by world size: 0.000659986340906471 | |
Epoch: 5, Step: 62, Rank: 3, loss = 0.049828968942165375 | |
[2024-07-27 20:06:08,981] [INFO] [logging.py:96:log_dist] [Rank 0] step=62, skipped=0, lr=[1.3402931744416432e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:09,058] [INFO] [timer.py:258:stop] epoch=0/micro_step=62/global_step=62, RunningAvgSamplesPerSec=31.69959766825067, CurrSamplesPerSec=30.510810812039587, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5,▋ | 2/12 [00:01<00:07, 1.38it/s] | |
"step": 62, | |
"rank": 0, | |
"loss": 0.05113256350159645, | |
"overall_throughput": 30.460830095500928, | |
"lr": 1.3402931744416432e-05, | |
"cuda_mem_allocated": 22.001431465148926, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 604, | |
"batch_size": 16, | |
"total_loss": 0.08346420526504517, | |
"gradnorm": 1.3599183559417725, | |
"weight_norm": 393.4732971191406, | |
"timestamp": "2024-07-27T20:06:09.061604" | |
} | |
Per-token loss scaled by world size: 0.0010878611356019974Per-token loss scaled by world size: 0.0003478115249890834Per-token loss scaled by world size: 0.0022740615531802177Per-token loss scaled by world size: 0.0003918901493307203Per-token loss scaled by world size: 0.002286511706188321Per-token loss scaled by world size: 0.0006958367303013802 | |
Per-token loss scaled by world size: 0.0007736408151686192 | |
Epoch: 5, Step: 63, Rank: 2, loss = 0.03065088950097561Epoch: 5, Step: 63, Rank: 5, loss = 0.2004016786813736 | |
Epoch: 5, Step: 63, Rank: 0, loss = 0.2014988511800766 | |
Epoch: 5, Step: 63, Rank: 1, loss = 0.06132061034440994Epoch: 5, Step: 63, Rank: 4, loss = 0.03453531861305237Epoch: 5, Step: 63, Rank: 3, loss = 0.09586776047945023 | |
Epoch: 5, Step: 63, Rank: 6, loss = 0.06817709654569626 | |
Per-token loss scaled by world size: 0.0005193906254135072 | |
Epoch: 5, Step: 63, Rank: 7, loss = 0.04577130079269409 | |
[2024-07-27 20:06:09,513] [INFO] [logging.py:96:log_dist] [Rank 0] step=63, skipped=0, lr=[1.3090169943749475e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:09,591] [INFO] [timer.py:258:stop] epoch=0/micro_step=63/global_step=63, RunningAvgSamplesPerSec=31.71584398677769, CurrSamplesPerSec=32.72206448467118, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5,█▌ | 3/12 [00:02<00:05, 1.57it/s] | |
"step": 63, | |
"rank": 0, | |
"loss": 0.2014988511800766, | |
"overall_throughput": 32.65121769063287, | |
"lr": 1.3090169943749475e-05, | |
"cuda_mem_allocated": 22.00572395324707, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 705, | |
"batch_size": 16, | |
"total_loss": 0.09227793663740158, | |
"gradnorm": 2.188631534576416, | |
"weight_norm": 393.4734802246094, | |
"timestamp": "2024-07-27T20:06:09.633806" | |
} | |
Per-token loss scaled by world size: 0.0009381945710629225Per-token loss scaled by world size: 0.00045316756586544216Per-token loss scaled by world size: 0.0005594724207185209Per-token loss scaled by world size: 0.0003057016583625227Per-token loss scaled by world size: 0.0005305999657139182 | |
Per-token loss scaled by world size: 0.00392846018075943 | |
Per-token loss scaled by world size: 0.0012796723749488592 | |
Epoch: 5, Step: 64, Rank: 0, loss = 0.0768146812915802Epoch: 5, Step: 64, Rank: 6, loss = 0.03710309416055679Epoch: 5, Step: 64, Rank: 4, loss = 0.043442871421575546Epoch: 5, Step: 64, Rank: 5, loss = 0.045806802809238434 | |
Epoch: 5, Step: 64, Rank: 3, loss = 0.32164266705513 | |
Epoch: 5, Step: 64, Rank: 1, loss = 0.025029323995113373 | |
Epoch: 5, Step: 64, Rank: 2, loss = 0.10477317124605179 | |
Per-token loss scaled by world size: 0.00109212682582438 | |
Epoch: 5, Step: 64, Rank: 7, loss = 0.08941788226366043 | |
[2024-07-27 20:06:10,064] [INFO] [logging.py:96:log_dist] [Rank 0] step=64, skipped=0, lr=[1.2774029087618448e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:10,141] [INFO] [timer.py:258:stop] epoch=0/micro_step=64/global_step=64, RunningAvgSamplesPerSec=31.71981083700255, CurrSamplesPerSec=31.963679576668167, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5,██▎ | 4/12 [00:02<00:04, 1.66it/s] | |
"step": 64, | |
"rank": 0, | |
"loss": 0.0768146812915802, | |
"overall_throughput": 31.90590190730254, | |
"lr": 1.2774029087618448e-05, | |
"cuda_mem_allocated": 21.99880838394165, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 655, | |
"batch_size": 16, | |
"total_loss": 0.09300381690263748, | |
"gradnorm": 1.6255645751953125, | |
"weight_norm": 393.4736633300781, | |
"timestamp": "2024-07-27T20:06:10.186141" | |
} | |
Per-token loss scaled by world size: 0.0019039374310523272Per-token loss scaled by world size: 0.0005504547152668238Per-token loss scaled by world size: 0.001485039945691824Per-token loss scaled by world size: 0.0013535844627767801 | |
Per-token loss scaled by world size: 0.000591020449064672 | |
Per-token loss scaled by world size: 0.0014079039683565497 | |
Epoch: 5, Step: 65, Rank: 3, loss = 0.042040977627038956Epoch: 5, Step: 65, Rank: 0, loss = 0.11341992765665054 | |
Epoch: 5, Step: 65, Rank: 7, loss = 0.10338001698255539 | |
Epoch: 5, Step: 65, Rank: 4, loss = 0.04513918608427048 | |
Epoch: 5, Step: 65, Rank: 1, loss = 0.14541321992874146 | |
Per-token loss scaled by world size: 0.0026307932566851377 | |
Epoch: 5, Step: 65, Rank: 5, loss = 0.20092684030532837 | |
Epoch: 5, Step: 65, Rank: 2, loss = 0.10752866417169571 | |
Per-token loss scaled by world size: 0.002622765488922596 | |
Epoch: 5, Step: 65, Rank: 6, loss = 0.2003137171268463 | |
[2024-07-27 20:06:10,619] [INFO] [logging.py:96:log_dist] [Rank 0] step=65, skipped=0, lr=[1.2454854871407993e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:10,698] [INFO] [timer.py:258:stop] epoch=0/micro_step=65/global_step=65, RunningAvgSamplesPerSec=31.71541920662633, CurrSamplesPerSec=31.44549285353818, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5,███▏ | 5/12 [00:03<00:04, 1.71it/s] | |
"step": 65, | |
"rank": 0, | |
"loss": 0.11341992765665054, | |
"overall_throughput": 31.39161267024543, | |
"lr": 1.2454854871407993e-05, | |
"cuda_mem_allocated": 22.00572395324707, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 611, | |
"batch_size": 16, | |
"total_loss": 0.11977030336856842, | |
"gradnorm": 1.5310957431793213, | |
"weight_norm": 393.4738464355469, | |
"timestamp": "2024-07-27T20:06:10.740662" | |
} | |
Per-token loss scaled by world size: 0.0006399019039236009Per-token loss scaled by world size: 0.0005316854221746325Per-token loss scaled by world size: 0.0012345308205112815Per-token loss scaled by world size: 0.00044449279084801674 | |
Per-token loss scaled by world size: 0.0006190972053445876Per-token loss scaled by world size: 0.0016892498824745417 | |
Epoch: 5, Step: 66, Rank: 0, loss = 0.03701859712600708 | |
Epoch: 5, Step: 66, Rank: 4, loss = 0.030947810038924217 | |
Epoch: 5, Step: 66, Rank: 7, loss = 0.0859542116522789Epoch: 5, Step: 66, Rank: 5, loss = 0.1176140233874321 | |
Epoch: 5, Step: 66, Rank: 2, loss = 0.04310464486479759Epoch: 5, Step: 66, Rank: 1, loss = 0.04455316811800003 | |
Per-token loss scaled by world size: 0.0005175694241188467 | |
Epoch: 5, Step: 66, Rank: 3, loss = 0.036035772413015366 | |
Per-token loss scaled by world size: 0.0009789945324882865 | |
Epoch: 5, Step: 66, Rank: 6, loss = 0.06816249340772629 | |
[2024-07-27 20:06:11,175] [INFO] [logging.py:96:log_dist] [Rank 0] step=66, skipped=0, lr=[1.213299630743747e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:11,253] [INFO] [timer.py:258:stop] epoch=0/micro_step=66/global_step=66, RunningAvgSamplesPerSec=31.711479482763888, CurrSamplesPerSec=31.465234804674058, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
Saving model in huggingface format at samples_seen: 1056 | |
{ | |
"epoch": 5, | |
"step": 66, | |
"rank": 0, | |
"loss": 0.03701859712600708, | |
"overall_throughput": 31.4168749534778, | |
"lr": 1.213299630743747e-05, | |
"cuda_mem_allocated": 22.009064197540283, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 557, | |
"batch_size": 16, | |
"total_loss": 0.05792384222149849, | |
"gradnorm": 1.2862759828567505, | |
"weight_norm": 393.4739685058594, | |
"timestamp": "2024-07-27T20:06:11.256198" | |
} | |
Model saved in /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_1056 | |
[20:06:29] INFO saving took 17.98075246810913 seconds utils.py:611 | |
Per-token loss scaled by world size: 0.0013491392601281404Per-token loss scaled by world size: 0.0005269849789328873Per-token loss scaled by world size: 0.004546341486275196Per-token loss scaled by world size: 0.000828504154924303369s/it] | |
Per-token loss scaled by world size: 0.0013991020387038589 | |
Per-token loss scaled by world size: 0.0006301040411926806 | |
Per-token loss scaled by world size: 0.0007652370841242373 | |
Epoch: 5, Step: 67, Rank: 1, loss = 0.06306988000869751Epoch: 5, Step: 67, Rank: 2, loss = 0.34609025716781616 | |
Epoch: 5, Step: 67, Rank: 4, loss = 0.10650664567947388Epoch: 5, Step: 67, Rank: 0, loss = 0.10270322859287262 | |
Epoch: 5, Step: 67, Rank: 3, loss = 0.04011673107743263 | |
Epoch: 5, Step: 67, Rank: 7, loss = 0.047966670244932175 | |
Epoch: 5, Step: 67, Rank: 5, loss = 0.05825367197394371 | |
Per-token loss scaled by world size: 0.0012170199770480394 | |
Epoch: 5, Step: 67, Rank: 6, loss = 0.09264564514160156 | |
[2024-07-27 20:06:29,721] [INFO] [logging.py:96:log_dist] [Rank 0] step=67, skipped=0, lr=[1.1808805343321102e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:29,799] [INFO] [timer.py:258:stop] epoch=0/micro_step=67/global_step=67, RunningAvgSamplesPerSec=31.70099434878959, CurrSamplesPerSec=31.044068891151483, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5,████▊ | 7/12 [00:22<00:23, 4.69s/it] | |
"step": 67, | |
"rank": 0, | |
"loss": 0.10270322859287262, | |
"overall_throughput": 30.986774873645142, | |
"lr": 1.1808805343321102e-05, | |
"cuda_mem_allocated": 22.004770278930664, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 609, | |
"batch_size": 16, | |
"total_loss": 0.10716909170150757, | |
"gradnorm": 1.8347676992416382, | |
"weight_norm": 393.4740905761719, | |
"timestamp": "2024-07-27T20:06:29.841377" | |
} | |
Per-token loss scaled by world size: 0.0020153727382421494Per-token loss scaled by world size: 0.0002941747079603374Per-token loss scaled by world size: 0.0007493247976526618Per-token loss scaled by world size: 0.0010664670262485743Per-token loss scaled by world size: 0.0009130059042945504 | |
Per-token loss scaled by world size: 0.0006243651150725782Per-token loss scaled by world size: 0.0005120610003359616 | |
Epoch: 5, Step: 68, Rank: 0, loss = 0.168031707406044 | |
Epoch: 5, Step: 68, Rank: 5, loss = 0.06247495487332344 | |
Epoch: 5, Step: 68, Rank: 3, loss = 0.08891668915748596Epoch: 5, Step: 68, Rank: 7, loss = 0.07612186670303345Epoch: 5, Step: 68, Rank: 2, loss = 0.024526815861463547 | |
Epoch: 5, Step: 68, Rank: 4, loss = 0.042693085968494415Epoch: 5, Step: 68, Rank: 6, loss = 0.052056439220905304 | |
Per-token loss scaled by world size: 0.0004039716732222587 | |
Epoch: 5, Step: 68, Rank: 1, loss = 0.03368113934993744 | |
[2024-07-27 20:06:30,266] [INFO] [logging.py:96:log_dist] [Rank 0] step=68, skipped=0, lr=[1.148263647711842e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:30,343] [INFO] [timer.py:258:stop] epoch=0/micro_step=68/global_step=68, RunningAvgSamplesPerSec=31.70601140764558, CurrSamplesPerSec=32.035561937422905, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5,█████▋ | 8/12 [00:22<00:13, 3.37s/it] | |
"step": 68, | |
"rank": 0, | |
"loss": 0.168031707406044, | |
"overall_throughput": 31.980130143367383, | |
"lr": 1.148263647711842e-05, | |
"cuda_mem_allocated": 21.998568058013916, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 667, | |
"batch_size": 16, | |
"total_loss": 0.06856284290552139, | |
"gradnorm": 1.0227607488632202, | |
"weight_norm": 393.47418212890625, | |
"timestamp": "2024-07-27T20:06:30.390363" | |
} | |
Per-token loss scaled by world size: 0.0027562566101551056Per-token loss scaled by world size: 0.0018840961856767535Per-token loss scaled by world size: 0.0018555921269580722Per-token loss scaled by world size: 0.0010745518375188112Per-token loss scaled by world size: 0.0009415823733434081Per-token loss scaled by world size: 0.0031567809637635946Per-token loss scaled by world size: 0.0009981651091948152 | |
Epoch: 5, Step: 69, Rank: 3, loss = 0.12411483377218246 | |
Epoch: 5, Step: 69, Rank: 6, loss = 0.07078610360622406Epoch: 5, Step: 69, Rank: 5, loss = 0.12223713099956512 | |
Epoch: 5, Step: 69, Rank: 1, loss = 0.20795294642448425Epoch: 5, Step: 69, Rank: 0, loss = 0.0620267391204834Epoch: 5, Step: 69, Rank: 7, loss = 0.18156839907169342Epoch: 5, Step: 69, Rank: 4, loss = 0.06575412303209305 | |
Per-token loss scaled by world size: 0.00018431547505315393 | |
Epoch: 5, Step: 69, Rank: 2, loss = 0.012141781859099865 | |
[2024-07-27 20:06:30,800] [INFO] [logging.py:96:log_dist] [Rank 0] step=69, skipped=0, lr=[1.1154846369695864e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:30,877] [INFO] [timer.py:258:stop] epoch=0/micro_step=69/global_step=69, RunningAvgSamplesPerSec=31.723357888730654, CurrSamplesPerSec=32.911763984671325, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5,██████▌ | 9/12 [00:23<00:07, 2.48s/it] | |
"step": 69, | |
"rank": 0, | |
"loss": 0.0620267391204834, | |
"overall_throughput": 32.829137857318266, | |
"lr": 1.1154846369695864e-05, | |
"cuda_mem_allocated": 21.999523639678955, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 527, | |
"batch_size": 16, | |
"total_loss": 0.10582275688648224, | |
"gradnorm": 2.0553536415100098, | |
"weight_norm": 393.4742431640625, | |
"timestamp": "2024-07-27T20:06:30.921842" | |
} | |
Per-token loss scaled by world size: 0.0010255835950374603Per-token loss scaled by world size: 0.0015445285243913531Per-token loss scaled by world size: 0.0007211874471977353Per-token loss scaled by world size: 0.0005934142973273993Per-token loss scaled by world size: 0.0017620512517169118 | |
Per-token loss scaled by world size: 0.000368919427273795Per-token loss scaled by world size: 0.0008395504555664957 | |
Epoch: 5, Step: 70, Rank: 7, loss = 0.11294364929199219 | |
Epoch: 5, Step: 70, Rank: 1, loss = 0.04339342191815376Epoch: 5, Step: 70, Rank: 6, loss = 0.052736829966306686Epoch: 5, Step: 70, Rank: 0, loss = 0.026977233588695526 | |
Epoch: 5, Step: 70, Rank: 5, loss = 0.12884999811649323Epoch: 5, Step: 70, Rank: 2, loss = 0.07499580085277557 | |
Epoch: 5, Step: 70, Rank: 4, loss = 0.061392128467559814 | |
Per-token loss scaled by world size: 0.0018547051586210728 | |
Epoch: 5, Step: 70, Rank: 3, loss = 0.13562531769275665 | |
[2024-07-27 20:06:31,354] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=0, lr=[1.0825793454723325e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:31,432] [INFO] [timer.py:258:stop] epoch=0/micro_step=70/global_step=70, RunningAvgSamplesPerSec=31.71917367216476, CurrSamplesPerSec=31.441323528309383, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5,███████▎ | 10/12 [00:23<00:03, 1.89s/it] | |
"step": 70, | |
"rank": 0, | |
"loss": 0.026977233588695526, | |
"overall_throughput": 31.36866352554035, | |
"lr": 1.0825793454723325e-05, | |
"cuda_mem_allocated": 21.999762058258057, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 585, | |
"batch_size": 16, | |
"total_loss": 0.0796142965555191, | |
"gradnorm": 1.7012439966201782, | |
"weight_norm": 393.4743347167969, | |
"timestamp": "2024-07-27T20:06:31.476165" | |
} | |
Per-token loss scaled by world size: 0.0009147366508841515Per-token loss scaled by world size: 0.0017351489514112473Per-token loss scaled by world size: 0.0008338880725204945Per-token loss scaled by world size: 0.00024312795721925795Per-token loss scaled by world size: 0.0006241968367248774 | |
Per-token loss scaled by world size: 0.00024290102010127157Per-token loss scaled by world size: 0.0020128381438553333 | |
Epoch: 5, Step: 71, Rank: 5, loss = 0.05847639963030815Epoch: 5, Step: 71, Rank: 1, loss = 0.12167732417583466Epoch: 5, Step: 71, Rank: 3, loss = 0.01704934798181057 | |
Epoch: 5, Step: 71, Rank: 2, loss = 0.04377180337905884 | |
Epoch: 5, Step: 71, Rank: 6, loss = 0.06414590775966644Epoch: 5, Step: 71, Rank: 7, loss = 0.01703343354165554 | |
Epoch: 5, Step: 71, Rank: 4, loss = 0.14115028083324432 | |
Per-token loss scaled by world size: 0.0004984893603250384 | |
Epoch: 5, Step: 71, Rank: 0, loss = 0.034956566989421844 | |
[2024-07-27 20:06:31,890] [INFO] [logging.py:96:log_dist] [Rank 0] step=71, skipped=0, lr=[1.0495837546732224e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:31,968] [INFO] [timer.py:258:stop] epoch=0/micro_step=71/global_step=71, RunningAvgSamplesPerSec=31.731589241737456, CurrSamplesPerSec=32.599273292528906, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5,████████▏| 11/12 [00:24<00:01, 1.47s/it] | |
"step": 71, | |
"rank": 0, | |
"loss": 0.034956566989421844, | |
"overall_throughput": 32.517245673379065, | |
"lr": 1.0495837546732224e-05, | |
"cuda_mem_allocated": 21.998329639434814, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 561, | |
"batch_size": 16, | |
"total_loss": 0.062282636761665344, | |
"gradnorm": 0.9715697765350342, | |
"weight_norm": 393.47442626953125, | |
"timestamp": "2024-07-27T20:06:32.016067" | |
} | |
Per-token loss scaled by world size: 0.0011429809965193272Per-token loss scaled by world size: 0.0009149574325419962Per-token loss scaled by world size: 0.0004773043910972774Per-token loss scaled by world size: 0.0027895078528672457Per-token loss scaled by world size: 0.004009348340332508Per-token loss scaled by world size: 0.0015114195412024856 | |
Per-token loss scaled by world size: 0.0012063757749274373 | |
Epoch: 5, Step: 72, Rank: 6, loss = 0.34730979800224304 | |
Epoch: 5, Step: 72, Rank: 2, loss = 0.07925818860530853Epoch: 5, Step: 72, Rank: 0, loss = 0.04134649410843849Epoch: 5, Step: 72, Rank: 5, loss = 0.09901072829961777Epoch: 5, Step: 72, Rank: 3, loss = 0.10450230538845062 | |
Epoch: 5, Step: 72, Rank: 7, loss = 0.130926713347435 | |
Epoch: 5, Step: 72, Rank: 1, loss = 0.2416411191225052 | |
Per-token loss scaled by world size: 0.0014137992402538657 | |
Epoch: 5, Step: 72, Rank: 4, loss = 0.12247036397457123 | |
[2024-07-27 20:06:32,427] [INFO] [logging.py:96:log_dist] [Rank 0] step=72, skipped=0, lr=[1.0165339447663586e-05], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:32,504] [INFO] [timer.py:258:stop] epoch=0/micro_step=72/global_step=72, RunningAvgSamplesPerSec=31.74744391309924, CurrSamplesPerSec=32.88104464616879, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 5,█████████| 12/12 [00:24<00:00, 1.19s/it] | |
"step": 72, | |
"rank": 0, | |
"loss": 0.04134649410843849, | |
"overall_throughput": 32.79027112653174, | |
"lr": 1.0165339447663586e-05, | |
"cuda_mem_allocated": 22.01025676727295, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 693, | |
"batch_size": 16, | |
"total_loss": 0.14580821990966797, | |
"gradnorm": 1.6911654472351074, | |
"weight_norm": 393.4745178222656, | |
"timestamp": "2024-07-27T20:06:32.547687" | |
} | |
Epoch 5: 100%|██████████| 12/12 [00:25<00:00, 2.09s/it] | |
total tokens: 214 num samples: 2 num padding tokens: 23 - rank: 1 max len: 107 min len: 84 avg len: 95.5 num_loss_counted_tokens: 132 | |
total tokens: 282 num samples: 2 num padding tokens: 83 - rank: 6 max len: 141 min len: 58 avg len: 99.5 num_loss_counted_tokens: 145 | |
total tokens: 144 num samples: 2 num padding tokens: 27 - rank: 7 max len: 72 min len: 45 avg len: 58.5 num_loss_counted_tokens: 73 | |
total tokens: 118 num samples: 2 num padding tokens: 9 - rank: 1 max len: 59 min len: 50 avg len: 54.5 num_loss_counted_tokens: 51 | |
total tokens: 172 num samples: 2 num padding tokens: 19 - rank: 7 max len: 86 min len: 67 avg len: 76.5 num_loss_counted_tokens: 75 | |
total tokens: 148 num samples: 2 num padding tokens: 17 - rank: 0 max len: 74 min len: 57 avg len: 65.5 num_loss_counted_tokens: 73 | |
total tokens: 138 num samples: 2 num padding tokens: 1 - rank: 7 max len: 69 min len: 68 avg len: 68.5 num_loss_counted_tokens: 57 | |
total tokens: 106 num samples: 2 num padding tokens: 5 - rank: 1 max len: 53 min len: 48 avg len: 50.5 num_loss_counted_tokens: 46 | |
total tokens: 160 num samples: 2 num padding tokens: 18 - rank: 0 max len: 80 min len: 62 avg len: 71.0 num_loss_counted_tokens: 81 | |
total tokens: 174 num samples: 2 num padding tokens: 17 - rank: 7 max len: 87 min len: 70 avg len: 78.5 num_loss_counted_tokens: 77 | |
total tokens: 164 num samples: 2 num padding tokens: 21 - rank: 7 max len: 82 min len: 61 avg len: 71.5 num_loss_counted_tokens: 92 | |
total tokens: 188 num samples: 2 num padding tokens: 19 - rank: 0 max len: 94 min len: 75 avg len: 84.5 num_loss_counted_tokens: 99 | |
total tokens: 138 num samples: 2 num padding tokens: 5 - rank: 2 max len: 69 min len: 64 avg len: 66.5 num_loss_counted_tokens: 70 | |
total tokens: 186 num samples: 2 num padding tokens: 14 - rank: 3 max len: 93 min len: 79 avg len: 86.0 num_loss_counted_tokens: 128 | |
total tokens: 162 num samples: 2 num padding tokens: 18 - rank: 0 max len: 81 min len: 63 avg len: 72.0 num_loss_counted_tokens: 82 | |
total tokens: 126 num samples: 2 num padding tokens: 8 - rank: 6 max len: 63 min len: 55 avg len: 59.0 num_loss_counted_tokens: 54 | |
total tokens: 128 num samples: 2 num padding tokens: 11 - rank: 3 max len: 64 min len: 53 avg len: 58.5 num_loss_counted_tokens: 67 | |
total tokens: 214 num samples: 2 num padding tokens: 31 - rank: 1 max len: 107 min len: 76 avg len: 91.5 num_loss_counted_tokens: 117 | |
total tokens: 200 num samples: 2 num padding tokens: 10 - rank: 7 max len: 100 min len: 90 avg len: 95.0 num_loss_counted_tokens: 151 | |
total tokens: 244 num samples: 2 num padding tokens: 70 - rank: 6 max len: 122 min len: 52 avg len: 87.0 num_loss_counted_tokens: 113 | |
total tokens: 140 num samples: 2 num padding tokens: 18 - rank: 0 max len: 70 min len: 52 avg len: 61.0 num_loss_counted_tokens: 75 | |
total tokens: 124 num samples: 2 num padding tokens: 10 - rank: 6 max len: 62 min len: 52 avg len: 57.0 num_loss_counted_tokens: 62 | |
total tokens: 176 num samples: 2 num padding tokens: 8 - rank: 2 max len: 88 min len: 80 avg len: 84.0 num_loss_counted_tokens: 99 | |
total tokens: 120 num samples: 2 num padding tokens: 0 - rank: 7 max len: 60 min len: 60 avg len: 60.0 num_loss_counted_tokens: 65 | |
total tokens: 152 num samples: 2 num padding tokens: 14 - rank: 7 max len: 76 min len: 62 avg len: 69.0 num_loss_counted_tokens: 83 | |
total tokens: 208 num samples: 2 num padding tokens: 46 - rank: 7 max len: 104 min len: 58 avg len: 81.0 num_loss_counted_tokens: 107 | |
total tokens: 132 num samples: 2 num padding tokens: 8 - rank: 7 max len: 66 min len: 58 avg len: 62.0 num_loss_counted_tokens: 55 | |
total tokens: 148 num samples: 2 num padding tokens: 10 - rank: 1 max len: 74 min len: 64 avg len: 69.0 num_loss_counted_tokens: 73 | |
total tokens: 152 num samples: 2 num padding tokens: 5 - rank: 0 max len: 76 min len: 71 avg len: 73.5 num_loss_counted_tokens: 91 | |
total tokens: 154 num samples: 2 num padding tokens: 13 - rank: 6 max len: 77 min len: 64 avg len: 70.5 num_loss_counted_tokens: 78 | |
total tokens: 168 num samples: 2 num padding tokens: 36 - rank: 2 max len: 84 min len: 48 avg len: 66.0 num_loss_counted_tokens: 72 | |
total tokens: 130 num samples: 2 num padding tokens: 17 - rank: 0 max len: 65 min len: 48 avg len: 56.5 num_loss_counted_tokens: 62 | |
total tokens: 186 num samples: 2 num padding tokens: 42 - rank: 0 max len: 93 min len: 51 avg len: 72.0 num_loss_counted_tokens: 96 | |
total tokens: 140 num samples: 2 num padding tokens: 17 - rank: 0 max len: 70 min len: 53 avg len: 61.5 num_loss_counted_tokens: 61 | |
total tokens: 166 num samples: 2 num padding tokens: 19 - rank: 0 max len: 83 min len: 64 avg len: 73.5 num_loss_counted_tokens: 76 | |
total tokens: 104 num samples: 2 num padding tokens: 2 - rank: 0 max len: 52 min len: 50 avg len: 51.0 num_loss_counted_tokens: 61 | |
total tokens: 122 num samples: 2 num padding tokens: 2 - rank: 7 max len: 61 min len: 59 avg len: 60.0 num_loss_counted_tokens: 62 | |
total tokens: 188 num samples: 2 num padding tokens: 39 - rank: 2 max len: 94 min len: 55 avg len: 74.5 num_loss_counted_tokens: 95 | |
total tokens: 102 num samples: 2 num padding tokens: 7 - rank: 2 max len: 51 min len: 44 avg len: 47.5 num_loss_counted_tokens: 53 | |
total tokens: 146 num samples: 2 num padding tokens: 10 - rank: 6 max len: 73 min len: 63 avg len: 68.0 num_loss_counted_tokens: 72 | |
total tokens: 118 num samples: 2 num padding tokens: 4 - rank: 7 max len: 59 min len: 55 avg len: 57.0 num_loss_counted_tokens: 71 | |
total tokens: 216 num samples: 2 num padding tokens: 21 - rank: 0 max len: 108 min len: 87 avg len: 97.5 num_loss_counted_tokens: 124 | |
total tokens: 104 num samples: 2 num padding tokens: 8 - rank: 6 max len: 52 min len: 44 avg len: 48.0 num_loss_counted_tokens: 52 | |
total tokens: 134 num samples: 2 num padding tokens: 8 - rank: 2 max len: 67 min len: 59 avg len: 63.0 num_loss_counted_tokens: 71 | |
total tokens: 168 num samples: 2 num padding tokens: 25 - rank: 2 max len: 84 min len: 59 avg len: 71.5 num_loss_counted_tokens: 89 | |
total tokens: 146 num samples: 2 num padding tokens: 19 - rank: 2 max len: 73 min len: 54 avg len: 63.5 num_loss_counted_tokens: 80 | |
total tokens: 154 num samples: 2 num padding tokens: 27 - rank: 2 max len: 77 min len: 50 avg len: 63.5 num_loss_counted_tokens: 83 | |
total tokens: 164 num samples: 2 num padding tokens: 36 - rank: 6 max len: 82 min len: 46 avg len: 64.0 num_loss_counted_tokens: 75 | |
total tokens: 122 num samples: 2 num padding tokens: 10 - rank: 2 max len: 61 min len: 51 avg len: 56.0 num_loss_counted_tokens: 56 | |
total tokens: 156 num samples: 2 num padding tokens: 20 - rank: 4 max len: 78 min len: 58 avg len: 68.0 num_loss_counted_tokens: 78 | |
total tokens: 172 num samples: 2 num padding tokens: 32 - rank: 2 max len: 86 min len: 54 avg len: 70.0 num_loss_counted_tokens: 60 | |
total tokens: 110 num samples: 2 num padding tokens: 10 - rank: 4 max len: 55 min len: 45 avg len: 50.0 num_loss_counted_tokens: 54 | |
total tokens: 124 num samples: 2 num padding tokens: 18 - rank: 4 max len: 62 min len: 44 avg len: 53.0 num_loss_counted_tokens: 60 | |
total tokens: 162 num samples: 2 num padding tokens: 19 - rank: 6 max len: 81 min len: 62 avg len: 71.5 num_loss_counted_tokens: 82 | |
total tokens: 104 num samples: 2 num padding tokens: 6 - rank: 4 max len: 52 min len: 46 avg len: 49.0 num_loss_counted_tokens: 56 | |
total tokens: 132 num samples: 2 num padding tokens: 6 - rank: 3 max len: 66 min len: 60 avg len: 63.0 num_loss_counted_tokens: 66 | |
total tokens: 226 num samples: 2 num padding tokens: 48 - rank: 4 max len: 113 min len: 65 avg len: 89.0 num_loss_counted_tokens: 95 | |
total tokens: 132 num samples: 2 num padding tokens: 12 - rank: 4 max len: 66 min len: 54 avg len: 60.0 num_loss_counted_tokens: 69 | |
total tokens: 228 num samples: 2 num padding tokens: 17 - rank: 4 max len: 114 min len: 97 avg len: 105.5 num_loss_counted_tokens: 158 | |
total tokens: 142 num samples: 2 num padding tokens: 5 - rank: 6 max len: 71 min len: 66 avg len: 68.5 num_loss_counted_tokens: 67 | |
total tokens: 98 num samples: 2 num padding tokens: 3 - rank: 4 max len: 49 min len: 46 avg len: 47.5 num_loss_counted_tokens: 47 | |
total tokens: 128 num samples: 2 num padding tokens: 4 - rank: 3 max len: 64 min len: 60 avg len: 62.0 num_loss_counted_tokens: 70 | |
total tokens: 196 num samples: 2 num padding tokens: 38 - rank: 3 max len: 98 min len: 60 avg len: 79.0 num_loss_counted_tokens: 112 | |
total tokens: 142 num samples: 2 num padding tokens: 5 - rank: 3 max len: 71 min len: 66 avg len: 68.5 num_loss_counted_tokens: 75 | |
total tokens: 120 num samples: 2 num padding tokens: 3 - rank: 3 max len: 60 min len: 57 avg len: 58.5 num_loss_counted_tokens: 59 | |
total tokens: 110 num samples: 2 num padding tokens: 10 - rank: 4 max len: 55 min len: 45 avg len: 50.0 num_loss_counted_tokens: 59 total tokens: 116 num samples: 2 num padding tokens: 9 - rank: 3 max len: 58 min len: 49 avg len: 53.5 num_loss_counted_tokens: 57 | |
total tokens: 126 num samples: 2 num padding tokens: 17 - rank: 5 max len: 63 min len: 46 avg len: 54.5 num_loss_counted_tokens: 52 | |
total tokens: 132 num samples: 2 num padding tokens: 6 - rank: 5 max len: 66 min len: 60 avg len: 63.0 num_loss_counted_tokens: 63 | |
total tokens: 180 num samples: 2 num padding tokens: 32 - rank: 6 max len: 90 min len: 58 avg len: 74.0 num_loss_counted_tokens: 99 | |
total tokens: 110 num samples: 2 num padding tokens: 4 - rank: 3 max len: 55 min len: 51 avg len: 53.0 num_loss_counted_tokens: 45 | |
total tokens: 166 num samples: 2 num padding tokens: 38 - rank: 1 max len: 83 min len: 45 avg len: 64.0 num_loss_counted_tokens: 56 | |
total tokens: 144 num samples: 2 num padding tokens: 4 - rank: 3 max len: 72 min len: 68 avg len: 70.0 num_loss_counted_tokens: 60 | |
total tokens: 138 num samples: 2 num padding tokens: 0 - rank: 3 max len: 69 min len: 69 avg len: 69.0 num_loss_counted_tokens: 75 | |
total tokens: 186 num samples: 2 num padding tokens: 12 - rank: 4 max len: 93 min len: 81 avg len: 87.0 num_loss_counted_tokens: 131 | |
total tokens: 184 num samples: 2 num padding tokens: 31 - rank: 1 max len: 92 min len: 61 avg len: 76.5 num_loss_counted_tokens: 87 | |
total tokens: 142 num samples: 2 num padding tokens: 4 - rank: 4 max len: 71 min len: 67 avg len: 69.0 num_loss_counted_tokens: 59 total tokens: 126 num samples: 2 num padding tokens: 20 - rank: 5 max len: 63 min len: 43 avg len: 53.0 num_loss_counted_tokens: 42 | |
total tokens: 172 num samples: 2 num padding tokens: 16 - rank: 5 max len: 86 min len: 70 avg len: 78.0 num_loss_counted_tokens: 85 | |
total tokens: 116 num samples: 2 num padding tokens: 3 - rank: 1 max len: 58 min len: 55 avg len: 56.5 num_loss_counted_tokens: 63 | |
total tokens: 158 num samples: 2 num padding tokens: 24 - rank: 1 max len: 79 min len: 55 avg len: 67.0 num_loss_counted_tokens: 75 | |
total tokens: 148 num samples: 2 num padding tokens: 13 - rank: 1 max len: 74 min len: 61 avg len: 67.5 num_loss_counted_tokens: 69 | |
total tokens: 202 num samples: 2 num padding tokens: 11 - rank: 1 max len: 101 min len: 90 avg len: 95.5 num_loss_counted_tokens: 138 | |
total tokens: 140 num samples: 2 num padding tokens: 10 - rank: 5 max len: 70 min len: 60 avg len: 65.0 num_loss_counted_tokens: 69 | |
total tokens: 174 num samples: 2 num padding tokens: 27 - rank: 1 max len: 87 min len: 60 avg len: 73.5 num_loss_counted_tokens: 76 | |
total tokens: 186 num samples: 2 num padding tokens: 31 - rank: 5 max len: 93 min len: 62 avg len: 77.5 num_loss_counted_tokens: 81 | |
total tokens: 166 num samples: 2 num padding tokens: 33 - rank: 5 max len: 83 min len: 50 avg len: 66.5 num_loss_counted_tokens: 82 | |
total tokens: 142 num samples: 2 num padding tokens: 16 - rank: 5 max len: 71 min len: 55 avg len: 63.0 num_loss_counted_tokens: 61 | |
total tokens: 146 num samples: 2 num padding tokens: 28 - rank: 5 max len: 73 min len: 45 avg len: 59.0 num_loss_counted_tokens: 72 | |
total tokens: 120 num samples: 2 num padding tokens: 11 - rank: 5 max len: 60 min len: 49 avg len: 54.5 num_loss_counted_tokens: 79 | |
total tokens: 152 num samples: 2 num padding tokens: 8 - rank: 5 max len: 76 min len: 68 avg len: 72.0 num_loss_counted_tokens: 69 | |
total tokens: 126 num samples: 2 num padding tokens: 6 - rank: 6 max len: 63 min len: 57 avg len: 60.0 num_loss_counted_tokens: 66 | |
total tokens: 122 num samples: 2 num padding tokens: 3 - rank: 3 max len: 61 min len: 58 avg len: 59.5 num_loss_counted_tokens: 69 | |
total tokens: 132 num samples: 2 num padding tokens: 3 - rank: 2 max len: 66 min len: 63 avg len: 64.5 num_loss_counted_tokens: 70 | |
total tokens: 140 num samples: 2 num padding tokens: 2 - rank: 4 max len: 70 min len: 68 avg len: 69.0 num_loss_counted_tokens: 55 | |
total tokens: 134 num samples: 2 num padding tokens: 3 - rank: 5 max len: 67 min len: 64 avg len: 65.5 num_loss_counted_tokens: 61 | |
Per-token loss scaled by world size: 0.0006204941309988499Per-token loss scaled by world size: 0.0005415144260041416Per-token loss scaled by world size: 0.0004509067512117326 | |
Per-token loss scaled by world size: 7.763502799207345e-05 | |
Per-token loss scaled by world size: 0.0008618941647000611Per-token loss scaled by world size: 0.0005943336291238666Per-token loss scaled by world size: 0.0004708097840193659 | |
Epoch: 6, Step: 73, Rank: 6, loss = 0.04921012371778488 | |
Epoch: 6, Step: 73, Rank: 5, loss = 0.00705508328974247Epoch: 6, Step: 73, Rank: 3, loss = 0.05638740584254265 | |
Epoch: 6, Step: 73, Rank: 2, loss = 0.0409761518239975 | |
Epoch: 6, Step: 73, Rank: 1, loss = 0.04278483986854553 | |
Epoch: 6, Step: 73, Rank: 7, loss = 0.07832463085651398 | |
Epoch: 6, Step: 73, Rank: 0, loss = 0.054010067135095596 | |
Per-token loss scaled by world size: 0.00044023498776368797 | |
Epoch: 6, Step: 73, Rank: 4, loss = 0.040006354451179504 | |
[2024-07-27 20:06:33,460] [INFO] [logging.py:96:log_dist] [Rank 0] step=73, skipped=0, lr=[9.834660552336415e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:33,537] [INFO] [timer.py:258:stop] epoch=0/micro_step=73/global_step=73, RunningAvgSamplesPerSec=31.690802156326086, CurrSamplesPerSec=28.172368613829672, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 6: 8%|▊ | 1/12 [00:00<00:10, 1.06it/s]{ | |
"epoch": 6, | |
"step": 73, | |
"rank": 0, | |
"loss": 0.054010067135095596, | |
"overall_throughput": 28.063288365518915, | |
"lr": 9.834660552336415e-06, | |
"cuda_mem_allocated": 22.000954627990723, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 727, | |
"batch_size": 16, | |
"total_loss": 0.04609433189034462, | |
"gradnorm": 0.7181567549705505, | |
"weight_norm": 393.4746398925781, | |
"timestamp": "2024-07-27T20:06:33.581427" | |
} | |
Per-token loss scaled by world size: 0.00021588351228274405Per-token loss scaled by world size: 0.0018644272349774837Per-token loss scaled by world size: 0.0009222680237144232Per-token loss scaled by world size: 0.0011992761865258217Per-token loss scaled by world size: 0.00015600323968101293Per-token loss scaled by world size: 0.0007281338912434876Per-token loss scaled by world size: 0.0013443040661513805 | |
Epoch: 6, Step: 74, Rank: 4, loss = 0.1323743313550949 | |
Epoch: 6, Step: 74, Rank: 7, loss = 0.0516975075006485Epoch: 6, Step: 74, Rank: 3, loss = 0.011076229624450207 | |
Epoch: 6, Step: 74, Rank: 5, loss = 0.015327729284763336Epoch: 6, Step: 74, Rank: 2, loss = 0.08514861017465591 | |
Epoch: 6, Step: 74, Rank: 6, loss = 0.09544558823108673 | |
Epoch: 6, Step: 74, Rank: 0, loss = 0.0654810294508934 | |
Per-token loss scaled by world size: 0.0015726651763543487 | |
Epoch: 6, Step: 74, Rank: 1, loss = 0.1116592288017273 | |
[2024-07-27 20:06:34,002] [INFO] [logging.py:96:log_dist] [Rank 0] step=74, skipped=0, lr=[9.504162453267776e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:34,080] [INFO] [timer.py:258:stop] epoch=0/micro_step=74/global_step=74, RunningAvgSamplesPerSec=31.69924367021435, CurrSamplesPerSec=32.31030745624361, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 6: 17%|█▋ | 2/12 [00:01<00:07, 1.41it/s]{ | |
"epoch": 6, | |
"step": 74, | |
"rank": 0, | |
"loss": 0.0654810294508934, | |
"overall_throughput": 32.25508431115172, | |
"lr": 9.504162453267776e-06, | |
"cuda_mem_allocated": 22.002385139465332, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 568, | |
"batch_size": 16, | |
"total_loss": 0.0710262879729271, | |
"gradnorm": 1.143301010131836, | |
"weight_norm": 393.4747314453125, | |
"timestamp": "2024-07-27T20:06:34.123115" | |
} | |
Per-token loss scaled by world size: 0.0001349089725408703Per-token loss scaled by world size: 0.0011630249209702015Per-token loss scaled by world size: 0.0005098663968965411Per-token loss scaled by world size: 0.001282830722630024Per-token loss scaled by world size: 0.0009069825755432248Per-token loss scaled by world size: 0.00048159470316022635 | |
Per-token loss scaled by world size: 0.0003646048135124147 | |
Epoch: 6, Step: 75, Rank: 5, loss = 0.035117048770189285Epoch: 6, Step: 75, Rank: 3, loss = 0.08010333776473999 | |
Epoch: 6, Step: 75, Rank: 4, loss = 0.08835496753454208Epoch: 6, Step: 75, Rank: 6, loss = 0.009291855618357658 | |
Epoch: 6, Step: 75, Rank: 2, loss = 0.06246842443943024 | |
Epoch: 6, Step: 75, Rank: 7, loss = 0.025112155824899673Epoch: 6, Step: 75, Rank: 0, loss = 0.033169835805892944 | |
Per-token loss scaled by world size: 0.0011549023911356926 | |
Epoch: 6, Step: 75, Rank: 1, loss = 0.07954390347003937 | |
[2024-07-27 20:06:34,562] [INFO] [logging.py:96:log_dist] [Rank 0] step=75, skipped=0, lr=[9.174206545276678e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:34,640] [INFO] [timer.py:258:stop] epoch=0/micro_step=75/global_step=75, RunningAvgSamplesPerSec=31.692720007171083, CurrSamplesPerSec=31.229969737456262, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
Epoch 6: 25%|██▌ | 3/12 [00:02<00:05, 1.56it/s]{ | |
"epoch": 6, | |
"step": 75, | |
"rank": 0, | |
"loss": 0.033169835805892944, | |
"overall_throughput": 31.178925272918942, | |
"lr": 9.174206545276678e-06, | |
"cuda_mem_allocated": 22.00572395324707, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 551, | |
"batch_size": 16, | |
"total_loss": 0.0516451895236969, | |
"gradnorm": 1.016838788986206, | |
"weight_norm": 393.4748229980469, | |
"timestamp": "2024-07-27T20:06:34.682642" | |
} | |
Per-token loss scaled by world size: 0.0003609564446378499Per-token loss scaled by world size: 0.00041217487887479365Per-token loss scaled by world size: 0.0004959891666658223Per-token loss scaled by world size: 0.00047398614697158337Per-token loss scaled by world size: 0.0007203637505881488 | |
Per-token loss scaled by world size: 0.0001487391273258254 | |
Per-token loss scaled by world size: 0.0008504824945703149 | |
Epoch: 6, Step: 76, Rank: 0, loss = 0.042779065668582916 | |
Epoch: 6, Step: 76, Rank: 3, loss = 0.04088130593299866Epoch: 6, Step: 76, Rank: 6, loss = 0.031132493168115616Epoch: 6, Step: 76, Rank: 7, loss = 0.0621313713490963 | |
Epoch: 6, Step: 76, Rank: 2, loss = 0.012828749604523182 | |
Epoch: 6, Step: 76, Rank: 5, loss = 0.035550083965063095 | |
Epoch: 6, Step: 76, Rank: 1, loss = 0.07335411757230759 | |
Per-token loss scaled by world size: 0.0007280391291715205 | |
Epoch: 6, Step: 76, Rank: 4, loss = 0.06279337406158447 | |
[2024-07-27 20:06:35,112] [INFO] [logging.py:96:log_dist] [Rank 0] step=76, skipped=0, lr=[8.84515363030414e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:35,189] [INFO] [timer.py:258:stop] epoch=0/micro_step=76/global_step=76, RunningAvgSamplesPerSec=31.692239680284427, CurrSamplesPerSec=31.657215099110317, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 6: 33%|███▎ | 4/12 [00:02<00:04, 1.66it/s]{ | |
"epoch": 6, | |
"step": 76, | |
"rank": 0, | |
"loss": 0.042779065668582916, | |
"overall_throughput": 31.57946786919451, | |
"lr": 8.84515363030414e-06, | |
"cuda_mem_allocated": 22.002624034881592, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 690, | |
"batch_size": 16, | |
"total_loss": 0.04518131911754608, | |
"gradnorm": 1.2256078720092773, | |
"weight_norm": 393.47491455078125, | |
"timestamp": "2024-07-27T20:06:35.231428" | |
} | |
Per-token loss scaled by world size: 0.0005857766373082995Per-token loss scaled by world size: 0.001119819818995893 | |
Per-token loss scaled by world size: 0.0010905693052336574Per-token loss scaled by world size: 0.00018508221546653658Per-token loss scaled by world size: 0.0016458512982353568Per-token loss scaled by world size: 0.00018191162962466478Per-token loss scaled by world size: 0.00047674551024101675 | |
Epoch: 6, Step: 77, Rank: 7, loss = 0.085386261343956 | |
Epoch: 6, Step: 77, Rank: 3, loss = 0.12549616396427155 | |
Epoch: 6, Step: 77, Rank: 6, loss = 0.01387076172977686Epoch: 6, Step: 77, Rank: 0, loss = 0.04466547071933746Epoch: 6, Step: 77, Rank: 4, loss = 0.014112519100308418Epoch: 6, Step: 77, Rank: 1, loss = 0.08315590769052505 | |
Epoch: 6, Step: 77, Rank: 5, loss = 0.03635184466838837 | |
Per-token loss scaled by world size: 0.0008063243585638702 | |
Epoch: 6, Step: 77, Rank: 2, loss = 0.06148223206400871 | |
[2024-07-27 20:06:35,656] [INFO] [logging.py:96:log_dist] [Rank 0] step=77, skipped=0, lr=[8.51736352288158e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:35,734] [INFO] [timer.py:258:stop] epoch=0/micro_step=77/global_step=77, RunningAvgSamplesPerSec=31.702193912096924, CurrSamplesPerSec=32.45657221649108, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Saving model in huggingface format at samples_seen: 1232 | |
{ | |
"epoch": 6, | |
"step": 77, | |
"rank": 0, | |
"loss": 0.04466547071933746, | |
"overall_throughput": 32.40263184704888, | |
"lr": 8.51736352288158e-06, | |
"cuda_mem_allocated": 22.000000476837158, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 610, | |
"batch_size": 16, | |
"total_loss": 0.05806514620780945, | |
"gradnorm": 1.030696988105774, | |
"weight_norm": 393.47503662109375, | |
"timestamp": "2024-07-27T20:06:35.737358" | |
} | |
Model saved in /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_1232 | |
[20:06:53] INFO saving took 18.036810636520386 seconds utils.py:611 | |
Epoch 6: 42%|████▏ | 5/12 [00:21<00:49, 7.09s/it]Per-token loss scaled by world size: 0.0021851949859410524Per-token loss scaled by world size: 0.0004372596740722656Per-token loss scaled by world size: 0.0008908362942747772Per-token loss scaled by world size: 0.00043337256647646427Per-token loss scaled by world size: 0.0002932958595920354Per-token loss scaled by world size: 0.0002709754917304963 | |
Per-token loss scaled by world size: 0.0006071476964280009 | |
Epoch: 6, Step: 78, Rank: 1, loss = 0.07071013003587723 | |
Epoch: 6, Step: 78, Rank: 0, loss = 0.034707486629486084Epoch: 6, Step: 78, Rank: 4, loss = 0.03439894691109657 | |
Epoch: 6, Step: 78, Rank: 3, loss = 0.02328035794198513Epoch: 6, Step: 78, Rank: 5, loss = 0.021508680656552315Epoch: 6, Step: 78, Rank: 6, loss = 0.048192348331213Epoch: 6, Step: 78, Rank: 7, loss = 0.17344985902309418 | |
Per-token loss scaled by world size: 0.0011103027500212193 | |
Epoch: 6, Step: 78, Rank: 2, loss = 0.08813028037548065 | |
[2024-07-27 20:06:54,252] [INFO] [logging.py:96:log_dist] [Rank 0] step=78, skipped=0, lr=[8.191194656678905e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:54,330] [INFO] [timer.py:258:stop] epoch=0/micro_step=78/global_step=78, RunningAvgSamplesPerSec=31.696677826343805, CurrSamplesPerSec=31.288371681003333, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 6: 50%|█████ | 6/12 [00:21<00:29, 4.87s/it]{ | |
"epoch": 6, | |
"step": 78, | |
"rank": 0, | |
"loss": 0.034707486629486084, | |
"overall_throughput": 31.230609214279298, | |
"lr": 8.191194656678905e-06, | |
"cuda_mem_allocated": 22.001431465148926, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 635, | |
"batch_size": 16, | |
"total_loss": 0.061797261238098145, | |
"gradnorm": 1.2869224548339844, | |
"weight_norm": 393.47509765625, | |
"timestamp": "2024-07-27T20:06:54.372620" | |
} | |
Per-token loss scaled by world size: 0.0011720252223312855Per-token loss scaled by world size: 0.0006744764395989478Per-token loss scaled by world size: 0.0009440272697247565Per-token loss scaled by world size: 0.0029248518403619528Per-token loss scaled by world size: 0.0009407943580299616Per-token loss scaled by world size: 0.0013611947651952505 | |
Per-token loss scaled by world size: 0.0014305550139397383 | |
Epoch: 6, Step: 79, Rank: 7, loss = 0.04670749232172966 | |
Epoch: 6, Step: 79, Rank: 0, loss = 0.08116274327039719Epoch: 6, Step: 79, Rank: 4, loss = 0.06537389010190964Epoch: 6, Step: 79, Rank: 1, loss = 0.20254598557949066Epoch: 6, Step: 79, Rank: 2, loss = 0.06515000760555267Epoch: 6, Step: 79, Rank: 5, loss = 0.0942627340555191 | |
Epoch: 6, Step: 79, Rank: 6, loss = 0.09906593710184097 | |
Per-token loss scaled by world size: 0.000943321269005537 | |
Epoch: 6, Step: 79, Rank: 3, loss = 0.06532499939203262 | |
[2024-07-27 20:06:54,805] [INFO] [logging.py:96:log_dist] [Rank 0] step=79, skipped=0, lr=[7.867003692562533e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:54,883] [INFO] [timer.py:258:stop] epoch=0/micro_step=79/global_step=79, RunningAvgSamplesPerSec=31.69457460454822, CurrSamplesPerSec=31.53554234673331, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 6: 58%|█████▊ | 7/12 [00:22<00:17, 3.46s/it]{ | |
"epoch": 6, | |
"step": 79, | |
"rank": 0, | |
"loss": 0.08116274327039719, | |
"overall_throughput": 31.485415170212256, | |
"lr": 7.867003692562533e-06, | |
"cuda_mem_allocated": 21.996094703674316, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 554, | |
"batch_size": 16, | |
"total_loss": 0.08994922786951065, | |
"gradnorm": 1.41256582736969, | |
"weight_norm": 393.4751892089844, | |
"timestamp": "2024-07-27T20:06:54.931617" | |
} | |
Per-token loss scaled by world size: 0.0004768831713590771Per-token loss scaled by world size: 0.002107172505930066Per-token loss scaled by world size: 0.0008781441720202565Per-token loss scaled by world size: 0.0014709294773638248 | |
Per-token loss scaled by world size: 0.00031639524968340993 | |
Per-token loss scaled by world size: 0.0003654623869806528 | |
Per-token loss scaled by world size: 0.000409139902330935 | |
Epoch: 6, Step: 80, Rank: 6, loss = 0.155930757522583 | |
Epoch: 6, Step: 80, Rank: 3, loss = 0.10884878039360046 | |
Epoch: 6, Step: 80, Rank: 5, loss = 0.023413248360157013Epoch: 6, Step: 80, Rank: 1, loss = 0.06498266756534576 | |
Epoch: 6, Step: 80, Rank: 4, loss = 0.02704421617090702Epoch: 6, Step: 80, Rank: 0, loss = 0.035289354622364044 | |
Epoch: 6, Step: 80, Rank: 2, loss = 0.030276352539658546 | |
Per-token loss scaled by world size: 0.002671802882105112 | |
Epoch: 6, Step: 80, Rank: 7, loss = 0.19771341979503632 | |
[2024-07-27 20:06:55,362] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=0, lr=[7.545145128592009e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:55,440] [INFO] [timer.py:258:stop] epoch=0/micro_step=80/global_step=80, RunningAvgSamplesPerSec=31.6941207935923, CurrSamplesPerSec=31.65921633267696, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
Epoch 6: 67%|██████▋ | 8/12 [00:22<00:10, 2.53s/it]{ | |
"epoch": 6, | |
"step": 80, | |
"rank": 0, | |
"loss": 0.035289354622364044, | |
"overall_throughput": 31.609841938694426, | |
"lr": 7.545145128592009e-06, | |
"cuda_mem_allocated": 22.009064197540283, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 592, | |
"batch_size": 16, | |
"total_loss": 0.08043734729290009, | |
"gradnorm": 1.2677600383758545, | |
"weight_norm": 393.4752502441406, | |
"timestamp": "2024-07-27T20:06:55.481943" | |
} | |
Per-token loss scaled by world size: 0.0005759032792411745Per-token loss scaled by world size: 0.0009630320128053427Per-token loss scaled by world size: 0.0008893606718629599Per-token loss scaled by world size: 0.0010249739279970527Per-token loss scaled by world size: 0.0008383162785321474Per-token loss scaled by world size: 0.0007667160243727267Per-token loss scaled by world size: 7.463712972821668e-05 | |
Epoch: 6, Step: 81, Rank: 3, loss = 0.0855894684791565Epoch: 6, Step: 81, Rank: 5, loss = 0.07904192805290222 | |
Epoch: 6, Step: 81, Rank: 7, loss = 0.05118340253829956 | |
Epoch: 6, Step: 81, Rank: 6, loss = 0.006633374840021133 | |
Epoch: 6, Step: 81, Rank: 4, loss = 0.0681418851017952Epoch: 6, Step: 81, Rank: 2, loss = 0.09109456092119217 | |
Epoch: 6, Step: 81, Rank: 0, loss = 0.07450535893440247 | |
Per-token loss scaled by world size: 0.0009560537873767316 | |
Epoch: 6, Step: 81, Rank: 1, loss = 0.08496928215026855 | |
[2024-07-27 20:06:55,907] [INFO] [logging.py:96:log_dist] [Rank 0] step=81, skipped=0, lr=[7.225970912381557e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:55,984] [INFO] [timer.py:258:stop] epoch=0/micro_step=81/global_step=81, RunningAvgSamplesPerSec=31.696829857403074, CurrSamplesPerSec=31.90957327177327, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
Epoch 6: 75%|███████▌ | 9/12 [00:23<00:05, 1.91s/it]{ | |
"epoch": 6, | |
"step": 81, | |
"rank": 0, | |
"loss": 0.07450535893440247, | |
"overall_throughput": 31.833316019435205, | |
"lr": 7.225970912381557e-06, | |
"cuda_mem_allocated": 22.00548553466797, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 711, | |
"batch_size": 16, | |
"total_loss": 0.06764490157365799, | |
"gradnorm": 1.3872599601745605, | |
"weight_norm": 393.4753112792969, | |
"timestamp": "2024-07-27T20:06:56.027764" | |
} | |
Per-token loss scaled by world size: 0.001566625782288611Per-token loss scaled by world size: 0.0001653370854910463Per-token loss scaled by world size: 0.00041765952482819557Per-token loss scaled by world size: 0.0008047792944125831Per-token loss scaled by world size: 0.0015484013129025698 | |
Per-token loss scaled by world size: 7.262427970999852e-05 | |
Per-token loss scaled by world size: 0.00017705872596707195 | |
Epoch: 6, Step: 82, Rank: 6, loss = 0.02996707148849964 | |
Epoch: 6, Step: 82, Rank: 5, loss = 0.01186293549835682 | |
Epoch: 6, Step: 82, Rank: 2, loss = 0.05774291232228279 | |
Epoch: 6, Step: 82, Rank: 1, loss = 0.11240539699792862Epoch: 6, Step: 82, Rank: 0, loss = 0.1110977977514267Epoch: 6, Step: 82, Rank: 7, loss = 0.005210792180150747 | |
Epoch: 6, Step: 82, Rank: 4, loss = 0.012703963555395603 | |
Per-token loss scaled by world size: 5.9409892855910584e-05 | |
Epoch: 6, Step: 82, Rank: 3, loss = 0.004262659698724747 | |
[2024-07-27 20:06:56,445] [INFO] [logging.py:96:log_dist] [Rank 0] step=82, skipped=0, lr=[6.909830056250527e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:56,522] [INFO] [timer.py:258:stop] epoch=0/micro_step=82/global_step=82, RunningAvgSamplesPerSec=31.70569941858613, CurrSamplesPerSec=32.42243510088761, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 6: 83%|████████▎ | 10/12 [00:23<00:02, 1.49s/it]{ | |
"epoch": 6, | |
"step": 82, | |
"rank": 0, | |
"loss": 0.1110977977514267, | |
"overall_throughput": 32.336618768855516, | |
"lr": 6.909830056250527e-06, | |
"cuda_mem_allocated": 21.99880838394165, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 574, | |
"batch_size": 16, | |
"total_loss": 0.0431566946208477, | |
"gradnorm": 1.0986570119857788, | |
"weight_norm": 393.4753723144531, | |
"timestamp": "2024-07-27T20:06:56.567461" | |
} | |
Per-token loss scaled by world size: 0.0015930738300085068Per-token loss scaled by world size: 0.0009168770629912615Per-token loss scaled by world size: 0.0008305592346005142Per-token loss scaled by world size: 0.0003735376812983304Per-token loss scaled by world size: 0.0023468886502087116 | |
Per-token loss scaled by world size: 0.0006343711283989251Per-token loss scaled by world size: 0.000816680898424238 | |
Epoch: 6, Step: 83, Rank: 1, loss = 0.0600554458796978 | |
Epoch: 6, Step: 83, Rank: 7, loss = 0.05349259823560715Epoch: 6, Step: 83, Rank: 5, loss = 0.05440162867307663 | |
Epoch: 6, Step: 83, Rank: 0, loss = 0.02446671761572361Epoch: 6, Step: 83, Rank: 2, loss = 0.15372121334075928 | |
Epoch: 6, Step: 83, Rank: 3, loss = 0.10434633493423462Epoch: 6, Step: 83, Rank: 6, loss = 0.04155131056904793 | |
Per-token loss scaled by world size: 0.00215042638592422 | |
Epoch: 6, Step: 83, Rank: 4, loss = 0.1408529281616211 | |
[2024-07-27 20:06:56,979] [INFO] [logging.py:96:log_dist] [Rank 0] step=83, skipped=0, lr=[6.59706825558357e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:57,057] [INFO] [timer.py:258:stop] epoch=0/micro_step=83/global_step=83, RunningAvgSamplesPerSec=31.71805120615546, CurrSamplesPerSec=32.73837880082133, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 6: 92%|█████████▏| 11/12 [00:24<00:01, 1.20s/it]{ | |
"epoch": 6, | |
"step": 83, | |
"rank": 0, | |
"loss": 0.02446671761572361, | |
"overall_throughput": 32.651551303359504, | |
"lr": 6.59706825558357e-06, | |
"cuda_mem_allocated": 22.003100872039795, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 524, | |
"batch_size": 16, | |
"total_loss": 0.0791110172867775, | |
"gradnorm": 1.3195643424987793, | |
"weight_norm": 393.4754333496094, | |
"timestamp": "2024-07-27T20:06:57.100239" | |
} | |
Per-token loss scaled by world size: 0.001030595856718719Per-token loss scaled by world size: 0.00038883870001882315Per-token loss scaled by world size: 0.00021640512568410486Per-token loss scaled by world size: 0.0008497635717503726Per-token loss scaled by world size: 0.0006636562757194042 | |
Per-token loss scaled by world size: 0.0012220889329910278 | |
Epoch: 6, Step: 84, Rank: 4, loss = 0.03300268575549126 | |
Epoch: 6, Step: 84, Rank: 6, loss = 0.018367385491728783Epoch: 6, Step: 84, Rank: 7, loss = 0.10372480005025864Epoch: 6, Step: 84, Rank: 1, loss = 0.072123683989048Epoch: 6, Step: 84, Rank: 2, loss = 0.05632782727479935 | |
Epoch: 6, Step: 84, Rank: 3, loss = 0.08747182786464691 | |
Per-token loss scaled by world size: 9.578206663718447e-05 | |
Epoch: 6, Step: 84, Rank: 0, loss = 0.008129502646625042 | |
Per-token loss scaled by world size: 0.0018002043943852186 | |
Epoch: 6, Step: 84, Rank: 5, loss = 0.15279234945774078 | |
[2024-07-27 20:06:57,504] [INFO] [logging.py:96:log_dist] [Rank 0] step=84, skipped=0, lr=[6.2880275108177915e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:57,582] [INFO] [timer.py:258:stop] epoch=0/micro_step=84/global_step=84, RunningAvgSamplesPerSec=31.736034255118497, CurrSamplesPerSec=33.26364124820817, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 6: 100%|██████████| 12/12 [00:24<00:00, 1.01it/s]{ | |
"epoch": 6, | |
"step": 84, | |
"rank": 0, | |
"loss": 0.008129502646625042, | |
"overall_throughput": 33.17494381766968, | |
"lr": 6.2880275108177915e-06, | |
"cuda_mem_allocated": 22.000000476837158, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 679, | |
"batch_size": 16, | |
"total_loss": 0.06649251282215118, | |
"gradnorm": 1.081682801246643, | |
"weight_norm": 393.4754638671875, | |
"timestamp": "2024-07-27T20:06:57.629230" | |
} | |
Epoch 6: 100%|██████████| 12/12 [00:25<00:00, 2.09s/it] | |
total tokens: 160 num samples: 2 num padding tokens: 1 - rank: 1 max len: 80 min len: 79 avg len: 79.5 num_loss_counted_tokens: 86 | |
total tokens: 122 num samples: 2 num padding tokens: 15 - rank: 1 max len: 61 min len: 46 avg len: 53.5 num_loss_counted_tokens: 57 total tokens: 162 num samples: 2 num padding tokens: 29 - rank: 5 max len: 81 min len: 52 avg len: 66.5 num_loss_counted_tokens: 74 | |
total tokens: 116 num samples: 2 num padding tokens: 13 - rank: 5 max len: 58 min len: 45 avg len: 51.5 num_loss_counted_tokens: 60 total tokens: 138 num samples: 2 num padding tokens: 16 - rank: 2 max len: 69 min len: 53 avg len: 61.0 num_loss_counted_tokens: 51 | |
total tokens: 118 num samples: 2 num padding tokens: 4 - rank: 2 max len: 59 min len: 55 avg len: 57.0 num_loss_counted_tokens: 62 | |
total tokens: 144 num samples: 2 num padding tokens: 10 - rank: 0 max len: 72 min len: 62 avg len: 67.0 num_loss_counted_tokens: 80 | |
total tokens: 124 num samples: 2 num padding tokens: 12 - rank: 2 max len: 62 min len: 50 avg len: 56.0 num_loss_counted_tokens: 56 | |
total tokens: 150 num samples: 2 num padding tokens: 12 - rank: 5 max len: 75 min len: 63 avg len: 69.0 num_loss_counted_tokens: 72 | |
total tokens: 214 num samples: 2 num padding tokens: 46 - rank: 2 max len: 107 min len: 61 avg len: 84.0 num_loss_counted_tokens: 107 | |
total tokens: 140 num samples: 2 num padding tokens: 15 - rank: 2 max len: 70 min len: 55 avg len: 62.5 num_loss_counted_tokens: 62 | |
total tokens: 180 num samples: 2 num padding tokens: 38 - rank: 7 max len: 90 min len: 52 avg len: 71.0 num_loss_counted_tokens: 95 | |
total tokens: 180 num samples: 2 num padding tokens: 7 - rank: 6 max len: 90 min len: 83 avg len: 86.5 num_loss_counted_tokens: 135 | |
total tokens: 152 num samples: 2 num padding tokens: 7 - rank: 5 max len: 76 min len: 69 avg len: 72.5 num_loss_counted_tokens: 101 | |
total tokens: 102 num samples: 2 num padding tokens: 8 - rank: 1 max len: 51 min len: 43 avg len: 47.0 num_loss_counted_tokens: 44 | |
total tokens: 106 num samples: 2 num padding tokens: 8 - rank: 1 max len: 53 min len: 45 avg len: 49.0 num_loss_counted_tokens: 46 | |
total tokens: 194 num samples: 2 num padding tokens: 53 - rank: 0 max len: 97 min len: 44 avg len: 70.5 num_loss_counted_tokens: 90 | |
total tokens: 122 num samples: 2 num padding tokens: 10 - rank: 6 max len: 61 min len: 51 avg len: 56.0 num_loss_counted_tokens: 56 | |
total tokens: 140 num samples: 2 num padding tokens: 8 - rank: 2 max len: 70 min len: 62 avg len: 66.0 num_loss_counted_tokens: 72 | |
total tokens: 208 num samples: 2 num padding tokens: 47 - rank: 6 max len: 104 min len: 57 avg len: 80.5 num_loss_counted_tokens: 112 | |
total tokens: 114 num samples: 2 num padding tokens: 12 - rank: 1 max len: 57 min len: 45 avg len: 51.0 num_loss_counted_tokens: 47 | |
total tokens: 140 num samples: 2 num padding tokens: 4 - rank: 0 max len: 70 min len: 66 avg len: 68.0 num_loss_counted_tokens: 79 | |
total tokens: 132 num samples: 2 num padding tokens: 11 - rank: 2 max len: 66 min len: 55 avg len: 60.5 num_loss_counted_tokens: 59 | |
total tokens: 282 num samples: 2 num padding tokens: 77 - rank: 7 max len: 141 min len: 64 avg len: 102.5 num_loss_counted_tokens: 152 | |
total tokens: 128 num samples: 2 num padding tokens: 1 - rank: 5 max len: 64 min len: 63 avg len: 63.5 num_loss_counted_tokens: 68 | |
total tokens: 172 num samples: 2 num padding tokens: 35 - rank: 0 max len: 86 min len: 51 avg len: 68.5 num_loss_counted_tokens: 71 | |
total tokens: 186 num samples: 2 num padding tokens: 33 - rank: 2 max len: 93 min len: 60 avg len: 76.5 num_loss_counted_tokens: 105 | |
total tokens: 120 num samples: 2 num padding tokens: 8 - rank: 2 max len: 60 min len: 52 avg len: 56.0 num_loss_counted_tokens: 63 | |
total tokens: 134 num samples: 2 num padding tokens: 4 - rank: 3 max len: 67 min len: 63 avg len: 65.0 num_loss_counted_tokens: 61 | |
total tokens: 146 num samples: 2 num padding tokens: 28 - rank: 5 max len: 73 min len: 45 avg len: 59.0 num_loss_counted_tokens: 72 | |
total tokens: 102 num samples: 2 num padding tokens: 1 - rank: 7 max len: 51 min len: 50 avg len: 50.5 num_loss_counted_tokens: 62 | |
total tokens: 136 num samples: 2 num padding tokens: 2 - rank: 6 max len: 68 min len: 66 avg len: 67.0 num_loss_counted_tokens: 58 | |
total tokens: 168 num samples: 2 num padding tokens: 14 - rank: 6 max len: 84 min len: 70 avg len: 77.0 num_loss_counted_tokens: 86 | |
total tokens: 132 num samples: 2 num padding tokens: 12 - rank: 1 max len: 66 min len: 54 avg len: 60.0 num_loss_counted_tokens: 57 | |
total tokens: 226 num samples: 2 num padding tokens: 27 - rank: 6 max len: 113 min len: 86 avg len: 99.5 num_loss_counted_tokens: 114 | |
total tokens: 174 num samples: 2 num padding tokens: 28 - rank: 4 max len: 87 min len: 59 avg len: 73.0 num_loss_counted_tokens: 90 | |
total tokens: 184 num samples: 2 num padding tokens: 14 - rank: 3 max len: 92 min len: 78 avg len: 85.0 num_loss_counted_tokens: 102 total tokens: 200 num samples: 2 num padding tokens: 48 - rank: 3 max len: 100 min len: 52 avg len: 76.0 num_loss_counted_tokens: 85 | |
total tokens: 176 num samples: 2 num padding tokens: 44 - rank: 0 max len: 88 min len: 44 avg len: 66.0 num_loss_counted_tokens: 74 | |
total tokens: 110 num samples: 2 num padding tokens: 0 - rank: 0 max len: 55 min len: 55 avg len: 55.0 num_loss_counted_tokens: 54 | |
total tokens: 186 num samples: 2 num padding tokens: 33 - rank: 0 max len: 93 min len: 60 avg len: 76.5 num_loss_counted_tokens: 104 | |
total tokens: 146 num samples: 2 num padding tokens: 24 - rank: 7 max len: 73 min len: 49 avg len: 61.0 num_loss_counted_tokens: 64 | |
total tokens: 214 num samples: 2 num padding tokens: 25 - rank: 0 max len: 107 min len: 82 avg len: 94.5 num_loss_counted_tokens: 135 | |
total tokens: 142 num samples: 2 num padding tokens: 3 - rank: 0 max len: 71 min len: 68 avg len: 69.5 num_loss_counted_tokens: 59 | |
total tokens: 152 num samples: 2 num padding tokens: 16 - rank: 0 max len: 76 min len: 60 avg len: 68.0 num_loss_counted_tokens: 77 | |
total tokens: 108 num samples: 2 num padding tokens: 6 - rank: 3 max len: 54 min len: 48 avg len: 51.0 num_loss_counted_tokens: 55 | |
total tokens: 166 num samples: 2 num padding tokens: 12 - rank: 6 max len: 83 min len: 71 avg len: 77.0 num_loss_counted_tokens: 79 | |
total tokens: 196 num samples: 2 num padding tokens: 28 - rank: 6 max len: 98 min len: 70 avg len: 84.0 num_loss_counted_tokens: 105 | |
total tokens: 186 num samples: 2 num padding tokens: 20 - rank: 0 max len: 93 min len: 73 avg len: 83.0 num_loss_counted_tokens: 135 | |
total tokens: 228 num samples: 2 num padding tokens: 52 - rank: 4 max len: 114 min len: 62 avg len: 88.0 num_loss_counted_tokens: 120 | |
total tokens: 244 num samples: 2 num padding tokens: 64 - rank: 3 max len: 122 min len: 58 avg len: 90.0 num_loss_counted_tokens: 127 | |
total tokens: 162 num samples: 2 num padding tokens: 2 - rank: 6 max len: 81 min len: 79 avg len: 80.0 num_loss_counted_tokens: 86 | |
total tokens: 120 num samples: 2 num padding tokens: 7 - rank: 6 max len: 60 min len: 53 avg len: 56.5 num_loss_counted_tokens: 71 | |
total tokens: 142 num samples: 2 num padding tokens: 12 - rank: 3 max len: 71 min len: 59 avg len: 65.0 num_loss_counted_tokens: 59 | |
total tokens: 132 num samples: 2 num padding tokens: 4 - rank: 3 max len: 66 min len: 62 avg len: 64.0 num_loss_counted_tokens: 71 | |
total tokens: 164 num samples: 2 num padding tokens: 19 - rank: 7 max len: 82 min len: 63 avg len: 72.5 num_loss_counted_tokens: 84 | |
total tokens: 122 num samples: 2 num padding tokens: 0 - rank: 6 max len: 61 min len: 61 avg len: 61.0 num_loss_counted_tokens: 61 | |
total tokens: 118 num samples: 2 num padding tokens: 11 - rank: 1 max len: 59 min len: 48 avg len: 53.5 num_loss_counted_tokens: 55 | |
total tokens: 142 num samples: 2 num padding tokens: 1 - rank: 1 max len: 71 min len: 70 avg len: 70.5 num_loss_counted_tokens: 72 | |
total tokens: 128 num samples: 2 num padding tokens: 15 - rank: 2 max len: 64 min len: 49 avg len: 56.5 num_loss_counted_tokens: 52 | |
total tokens: 172 num samples: 2 num padding tokens: 22 - rank: 3 max len: 86 min len: 64 avg len: 75.0 num_loss_counted_tokens: 70 | |
total tokens: 142 num samples: 2 num padding tokens: 8 - rank: 1 max len: 71 min len: 63 avg len: 67.0 num_loss_counted_tokens: 64 | |
total tokens: 120 num samples: 2 num padding tokens: 5 - rank: 3 max len: 60 min len: 55 avg len: 57.5 num_loss_counted_tokens: 65 | |
total tokens: 134 num samples: 2 num padding tokens: 17 - rank: 4 max len: 67 min len: 50 avg len: 58.5 num_loss_counted_tokens: 66 | |
total tokens: 126 num samples: 2 num padding tokens: 5 - rank: 4 max len: 63 min len: 58 avg len: 60.5 num_loss_counted_tokens: 61 total tokens: 132 num samples: 2 num padding tokens: 14 - rank: 4 max len: 66 min len: 52 avg len: 59.0 num_loss_counted_tokens: 66 | |
total tokens: 216 num samples: 2 num padding tokens: 7 - rank: 0 max len: 108 min len: 101 avg len: 104.5 num_loss_counted_tokens: 147 | |
total tokens: 120 num samples: 2 num padding tokens: 14 - rank: 7 max len: 60 min len: 46 avg len: 53.0 num_loss_counted_tokens: 57 | |
total tokens: 152 num samples: 2 num padding tokens: 12 - rank: 7 max len: 76 min len: 64 avg len: 70.0 num_loss_counted_tokens: 81 | |
total tokens: 130 num samples: 2 num padding tokens: 0 - rank: 7 max len: 65 min len: 65 avg len: 65.0 num_loss_counted_tokens: 59 | |
total tokens: 154 num samples: 2 num padding tokens: 10 - rank: 7 max len: 77 min len: 67 avg len: 72.0 num_loss_counted_tokens: 80 | |
total tokens: 180 num samples: 2 num padding tokens: 35 - rank: 7 max len: 90 min len: 55 avg len: 72.5 num_loss_counted_tokens: 97 | |
total tokens: 148 num samples: 2 num padding tokens: 14 - rank: 4 max len: 74 min len: 60 avg len: 67.0 num_loss_counted_tokens: 68 | |
total tokens: 126 num samples: 2 num padding tokens: 1 - rank: 1 max len: 63 min len: 62 avg len: 62.5 num_loss_counted_tokens: 61 | |
total tokens: 132 num samples: 2 num padding tokens: 7 - rank: 4 max len: 66 min len: 59 avg len: 62.5 num_loss_counted_tokens: 67 | |
total tokens: 152 num samples: 2 num padding tokens: 21 - rank: 5 max len: 76 min len: 55 avg len: 65.5 num_loss_counted_tokens: 63 | |
total tokens: 144 num samples: 2 num padding tokens: 11 - rank: 3 max len: 72 min len: 61 avg len: 66.5 num_loss_counted_tokens: 66 | |
total tokens: 138 num samples: 2 num padding tokens: 15 - rank: 5 max len: 69 min len: 54 avg len: 61.5 num_loss_counted_tokens: 62 | |
total tokens: 188 num samples: 2 num padding tokens: 50 - rank: 1 max len: 94 min len: 44 avg len: 69.0 num_loss_counted_tokens: 79 | |
total tokens: 148 num samples: 2 num padding tokens: 15 - rank: 5 max len: 74 min len: 59 avg len: 66.5 num_loss_counted_tokens: 71 | |
total tokens: 116 num samples: 2 num padding tokens: 6 - rank: 3 max len: 58 min len: 52 avg len: 55.0 num_loss_counted_tokens: 60 | |
total tokens: 128 num samples: 2 num padding tokens: 5 - rank: 5 max len: 64 min len: 59 avg len: 61.5 num_loss_counted_tokens: 65 | |
total tokens: 180 num samples: 2 num padding tokens: 35 - rank: 7 max len: 90 min len: 55 avg len: 72.5 num_loss_counted_tokens: 117 total tokens: 168 num samples: 2 num padding tokens: 24 - rank: 4 max len: 84 min len: 60 avg len: 72.0 num_loss_counted_tokens: 91 | |
total tokens: 174 num samples: 2 num padding tokens: 10 - rank: 4 max len: 87 min len: 77 avg len: 82.0 num_loss_counted_tokens: 93 | |
total tokens: 116 num samples: 2 num padding tokens: 1 - rank: 4 max len: 58 min len: 57 avg len: 57.5 num_loss_counted_tokens: 68 | |
total tokens: 128 num samples: 2 num padding tokens: 16 - rank: 4 max len: 64 min len: 48 avg len: 56.0 num_loss_counted_tokens: 64 | |
total tokens: 160 num samples: 2 num padding tokens: 22 - rank: 2 max len: 80 min len: 58 avg len: 69.0 num_loss_counted_tokens: 79 | |
total tokens: 174 num samples: 2 num padding tokens: 19 - rank: 5 max len: 87 min len: 68 avg len: 77.5 num_loss_counted_tokens: 76 | |
total tokens: 166 num samples: 2 num padding tokens: 23 - rank: 4 max len: 83 min len: 60 avg len: 71.5 num_loss_counted_tokens: 86 | |
total tokens: 188 num samples: 2 num padding tokens: 13 - rank: 7 max len: 94 min len: 81 avg len: 87.5 num_loss_counted_tokens: 115 | |
total tokens: 128 num samples: 2 num padding tokens: 15 - rank: 5 max len: 64 min len: 49 avg len: 56.5 num_loss_counted_tokens: 57 | |
total tokens: 134 num samples: 2 num padding tokens: 17 - rank: 2 max len: 67 min len: 50 avg len: 58.5 num_loss_counted_tokens: 55 | |
total tokens: 162 num samples: 2 num padding tokens: 19 - rank: 6 max len: 81 min len: 62 avg len: 71.5 num_loss_counted_tokens: 77 | |
total tokens: 116 num samples: 2 num padding tokens: 12 - rank: 1 max len: 58 min len: 46 avg len: 52.0 num_loss_counted_tokens: 58 | |
total tokens: 160 num samples: 2 num padding tokens: 12 - rank: 3 max len: 80 min len: 68 avg len: 74.0 num_loss_counted_tokens: 80 | |
Per-token loss scaled by world size: 0.00021730510343331844Per-token loss scaled by world size: 0.00023930655152071267Per-token loss scaled by world size: 0.00019531356520019472Per-token loss scaled by world size: 0.0005758063634857535Per-token loss scaled by world size: 0.00014575273962691426Per-token loss scaled by world size: 0.0007938417256809771 | |
Per-token loss scaled by world size: 0.00033632898703217506 | |
Epoch: 7, Step: 85, Rank: 3, loss = 0.0187968909740448 | |
Epoch: 7, Step: 85, Rank: 5, loss = 0.016894623637199402 | |
Epoch: 7, Step: 85, Rank: 6, loss = 0.020700016990303993Epoch: 7, Step: 85, Rank: 0, loss = 0.04980725049972534 | |
Epoch: 7, Step: 85, Rank: 4, loss = 0.01260761171579361 | |
Epoch: 7, Step: 85, Rank: 2, loss = 0.06866730749607086 | |
Epoch: 7, Step: 85, Rank: 1, loss = 0.0290924571454525 | |
Per-token loss scaled by world size: 0.00042542771552689373 | |
Epoch: 7, Step: 85, Rank: 7, loss = 0.03679949790239334 | |
[2024-07-27 20:06:58,520] [INFO] [logging.py:96:log_dist] [Rank 0] step=85, skipped=0, lr=[5.983045753470308e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:58,596] [INFO] [timer.py:258:stop] epoch=0/micro_step=85/global_step=85, RunningAvgSamplesPerSec=31.69756640961793, CurrSamplesPerSec=28.831859851847014, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7, | 1/12 [00:00<00:10, 1.08it/s] | |
"step": 85, | |
"rank": 0, | |
"loss": 0.04980725049972534, | |
"overall_throughput": 28.716406669884535, | |
"lr": 5.983045753470308e-06, | |
"cuda_mem_allocated": 22.00047731399536, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 692, | |
"batch_size": 16, | |
"total_loss": 0.0316707044839859, | |
"gradnorm": 0.7268746495246887, | |
"weight_norm": 393.4754943847656, | |
"timestamp": "2024-07-27T20:06:58.639043" | |
} | |
Per-token loss scaled by world size: 0.000342040992109105Per-token loss scaled by world size: 0.00027503896853886545Per-token loss scaled by world size: 0.00036574419937096536Per-token loss scaled by world size: 0.0006328842719085515Per-token loss scaled by world size: 0.0005108661716803908Per-token loss scaled by world size: 0.0006690495647490025 | |
Per-token loss scaled by world size: 0.0002407751599093899 | |
Epoch: 7, Step: 86, Rank: 0, loss = 0.032139770686626434 | |
Epoch: 7, Step: 86, Rank: 5, loss = 0.030056850984692574Epoch: 7, Step: 86, Rank: 4, loss = 0.058792732656002045Epoch: 7, Step: 86, Rank: 2, loss = 0.02416904829442501Epoch: 7, Step: 86, Rank: 6, loss = 0.05561470612883568 | |
Epoch: 7, Step: 86, Rank: 7, loss = 0.044892363250255585 | |
Epoch: 7, Step: 86, Rank: 3, loss = 0.021158117800951004 | |
Per-token loss scaled by world size: 0.0008176557603292167 | |
Epoch: 7, Step: 86, Rank: 1, loss = 0.07185149937868118 | |
[2024-07-27 20:06:59,066] [INFO] [logging.py:96:log_dist] [Rank 0] step=86, skipped=0, lr=[5.6824564766150724e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:59,144] [INFO] [timer.py:258:stop] epoch=0/micro_step=86/global_step=86, RunningAvgSamplesPerSec=31.703864056372694, CurrSamplesPerSec=32.23543844733132, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7,▋ | 2/12 [00:01<00:07, 1.42it/s] | |
"step": 86, | |
"rank": 0, | |
"loss": 0.032139770686626434, | |
"overall_throughput": 32.183202299620085, | |
"lr": 5.6824564766150724e-06, | |
"cuda_mem_allocated": 22.006441116333008, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 703, | |
"batch_size": 16, | |
"total_loss": 0.042334385216236115, | |
"gradnorm": 0.6796127557754517, | |
"weight_norm": 393.4755554199219, | |
"timestamp": "2024-07-27T20:06:59.187763" | |
} | |
Per-token loss scaled by world size: 0.00016505751409567893Per-token loss scaled by world size: 0.0008360829087905586 | |
Per-token loss scaled by world size: 0.0005081766867078841 | |
Per-token loss scaled by world size: 0.0005767009570263326Per-token loss scaled by world size: 0.0008457532385364175 | |
Per-token loss scaled by world size: 0.003279536496847868Per-token loss scaled by world size: 0.0016091869911178946 | |
Epoch: 7, Step: 87, Rank: 7, loss = 0.05246420204639435 | |
Epoch: 7, Step: 87, Rank: 0, loss = 0.010357359424233437 | |
Epoch: 7, Step: 87, Rank: 3, loss = 0.03188808634877205Epoch: 7, Step: 87, Rank: 6, loss = 0.2057909220457077Epoch: 7, Step: 87, Rank: 5, loss = 0.05307101458311081 | |
Epoch: 7, Step: 87, Rank: 2, loss = 0.03618798404932022 | |
Epoch: 7, Step: 87, Rank: 4, loss = 0.10097648203372955 | |
Per-token loss scaled by world size: 0.0013951770961284637 | |
Epoch: 7, Step: 87, Rank: 1, loss = 0.08754736185073853 | |
[2024-07-27 20:06:59,619] [INFO] [logging.py:96:log_dist] [Rank 0] step=87, skipped=0, lr=[5.386588370213124e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:06:59,697] [INFO] [timer.py:258:stop] epoch=0/micro_step=87/global_step=87, RunningAvgSamplesPerSec=31.702355408140964, CurrSamplesPerSec=31.576139496344755, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7,█▌ | 3/12 [00:02<00:05, 1.57it/s] | |
"step": 87, | |
"rank": 0, | |
"loss": 0.010357359424233437, | |
"overall_throughput": 31.529719530621602, | |
"lr": 5.386588370213124e-06, | |
"cuda_mem_allocated": 22.000000476837158, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 502, | |
"batch_size": 16, | |
"total_loss": 0.07228542864322662, | |
"gradnorm": 1.0233722925186157, | |
"weight_norm": 393.4755859375, | |
"timestamp": "2024-07-27T20:06:59.740017" | |
} | |
Per-token loss scaled by world size: 0.0004632726195268333Per-token loss scaled by world size: 0.0006792055210098624Per-token loss scaled by world size: 0.0006460993899963796Per-token loss scaled by world size: 7.74235013523139e-05Per-token loss scaled by world size: 0.00012206515384605154 | |
Per-token loss scaled by world size: 0.0019949208945035934 | |
Epoch: 7, Step: 88, Rank: 2, loss = 0.05039575323462486Epoch: 7, Step: 88, Rank: 0, loss = 0.009521082043647766 | |
Epoch: 7, Step: 88, Rank: 7, loss = 0.03613526374101639Epoch: 7, Step: 88, Rank: 3, loss = 0.0529780313372612 | |
Epoch: 7, Step: 88, Rank: 4, loss = 0.15560382604599 | |
Per-token loss scaled by world size: 0.0007654842338524759 | |
Epoch: 7, Step: 88, Rank: 6, loss = 0.006039033178240061 | |
Epoch: 7, Step: 88, Rank: 5, loss = 0.05970776826143265 | |
Per-token loss scaled by world size: 0.0008809524588286877 | |
Epoch: 7, Step: 88, Rank: 1, loss = 0.06871429085731506 | |
[2024-07-27 20:07:00,188] [INFO] [logging.py:96:log_dist] [Rank 0] step=88, skipped=0, lr=[5.095764961694923e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:00,266] [INFO] [timer.py:258:stop] epoch=0/micro_step=88/global_step=88, RunningAvgSamplesPerSec=31.68774037603646, CurrSamplesPerSec=30.492857616709514, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Saving model in huggingface format at samples_seen: 1408 | |
{ | |
"epoch": 7, | |
"step": 88, | |
"rank": 0, | |
"loss": 0.009521082043647766, | |
"overall_throughput": 30.41481690296598, | |
"lr": 5.095764961694923e-06, | |
"cuda_mem_allocated": 22.0038161277771, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 624, | |
"batch_size": 16, | |
"total_loss": 0.05488688498735428, | |
"gradnorm": 1.1451473236083984, | |
"weight_norm": 393.47564697265625, | |
"timestamp": "2024-07-27T20:07:00.269082" | |
} | |
Model saved in /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_1408 | |
[20:07:18] INFO saving took 17.875807285308838 seconds utils.py:611 | |
Per-token loss scaled by world size: 0.00015760907263029367Per-token loss scaled by world size: 0.00012432184303179383Per-token loss scaled by world size: 0.0010254974476993084Per-token loss scaled by world size: 0.0010104298125952482Per-token loss scaled by world size: 0.00047610432375222445 | |
Per-token loss scaled by world size: 0.00011171086953254417 | |
Per-token loss scaled by world size: 0.000618505000602454Epoch: 7, Step: 89, Rank: 2, loss = 0.010038988664746284Epoch: 7, Step: 89, Rank: 5, loss = 0.08159220963716507 | |
Epoch: 7, Step: 89, Rank: 7, loss = 0.012726932764053345 | |
Epoch: 7, Step: 89, Rank: 3, loss = 0.038445424288511276 | |
Epoch: 7, Step: 89, Rank: 4, loss = 0.08280891925096512 | |
Epoch: 7, Step: 89, Rank: 0, loss = 0.009020652621984482 | |
Epoch: 7, Step: 89, Rank: 6, loss = 0.049944277852773666 | |
Per-token loss scaled by world size: 0.0007019841577857733 | |
Epoch: 7, Step: 89, Rank: 1, loss = 0.05668522045016289 | |
[2024-07-27 20:07:18,636] [INFO] [logging.py:96:log_dist] [Rank 0] step=89, skipped=0, lr=[4.8103042621878515e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:18,713] [INFO] [timer.py:258:stop] epoch=0/micro_step=89/global_step=89, RunningAvgSamplesPerSec=31.681179911181776, CurrSamplesPerSec=31.12696454313878, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7,███▏ | 5/12 [00:21<00:35, 5.11s/it] | |
"step": 89, | |
"rank": 0, | |
"loss": 0.009020652621984482, | |
"overall_throughput": 31.06243279759643, | |
"lr": 4.8103042621878515e-06, | |
"cuda_mem_allocated": 22.00548553466797, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 646, | |
"batch_size": 16, | |
"total_loss": 0.04265782982110977, | |
"gradnorm": 0.9962098598480225, | |
"weight_norm": 393.4757080078125, | |
"timestamp": "2024-07-27T20:07:18.755800" | |
} | |
Per-token loss scaled by world size: 0.0009453526581637561Per-token loss scaled by world size: 0.0016993889585137367Per-token loss scaled by world size: 0.0008407846908085048Per-token loss scaled by world size: 6.15180833847262e-05Per-token loss scaled by world size: 0.0012258148053660989 | |
Per-token loss scaled by world size: 0.0002534937229938805Per-token loss scaled by world size: 7.776251732138917e-05 | |
Epoch: 7, Step: 90, Rank: 0, loss = 0.057784680277109146Epoch: 7, Step: 90, Rank: 1, loss = 0.10387515276670456Epoch: 7, Step: 90, Rank: 5, loss = 0.05139296501874924 | |
Epoch: 7, Step: 90, Rank: 3, loss = 0.0749279335141182Epoch: 7, Step: 90, Rank: 2, loss = 0.0037602928932756186 | |
Epoch: 7, Step: 90, Rank: 6, loss = 0.015494802966713905 | |
Epoch: 7, Step: 90, Rank: 4, loss = 0.00475323386490345 | |
Per-token loss scaled by world size: 0.0009510749368928373 | |
Epoch: 7, Step: 90, Rank: 7, loss = 0.05813445523381233 | |
[2024-07-27 20:07:19,178] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=0, lr=[4.530518418775734e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:19,255] [INFO] [timer.py:258:stop] epoch=0/micro_step=90/global_step=90, RunningAvgSamplesPerSec=31.685890209714664, CurrSamplesPerSec=32.10111808111374, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7,████ | 6/12 [00:21<00:21, 3.56s/it] | |
"step": 90, | |
"rank": 0, | |
"loss": 0.057784680277109146, | |
"overall_throughput": 32.01665600380905, | |
"lr": 4.530518418775734e-06, | |
"cuda_mem_allocated": 21.996421813964844, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 489, | |
"batch_size": 16, | |
"total_loss": 0.04626544192433357, | |
"gradnorm": 0.9134754538536072, | |
"weight_norm": 393.4757385253906, | |
"timestamp": "2024-07-27T20:07:19.305311" | |
} | |
Per-token loss scaled by world size: 0.00019140614313073456Per-token loss scaled by world size: 0.0003604689263738692Per-token loss scaled by world size: 0.00012782825797330588Per-token loss scaled by world size: 0.00011688289669109508 | |
Per-token loss scaled by world size: 0.0008099116967059672 | |
Per-token loss scaled by world size: 0.0005937899113632739 | |
Per-token loss scaled by world size: 0.0016323667950928211 | |
Epoch: 7, Step: 91, Rank: 4, loss = 0.029107866808772087Epoch: 7, Step: 91, Rank: 0, loss = 0.015456045977771282 | |
Epoch: 7, Step: 91, Rank: 1, loss = 0.010322132147848606Epoch: 7, Step: 91, Rank: 7, loss = 0.009438293986022472 | |
Epoch: 7, Step: 91, Rank: 2, loss = 0.0654003694653511 | |
Epoch: 7, Step: 91, Rank: 6, loss = 0.04794853553175926 | |
Epoch: 7, Step: 91, Rank: 3, loss = 0.13181361556053162 | |
Per-token loss scaled by world size: 8.413568866671994e-05 | |
Epoch: 7, Step: 91, Rank: 5, loss = 0.006793956737965345 | |
[2024-07-27 20:07:19,714] [INFO] [logging.py:96:log_dist] [Rank 0] step=91, skipped=0, lr=[4.256713373170565e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:19,792] [INFO] [timer.py:258:stop] epoch=0/micro_step=91/global_step=91, RunningAvgSamplesPerSec=31.69971548381683, CurrSamplesPerSec=32.965470896955004, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7,████▊ | 7/12 [00:22<00:12, 2.57s/it] | |
"step": 91, | |
"rank": 0, | |
"loss": 0.015456045977771282, | |
"overall_throughput": 32.88040023537493, | |
"lr": 4.256713373170565e-06, | |
"cuda_mem_allocated": 22.004292964935303, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 646, | |
"batch_size": 16, | |
"total_loss": 0.039535101503133774, | |
"gradnorm": 1.6763972043991089, | |
"weight_norm": 393.47576904296875, | |
"timestamp": "2024-07-27T20:07:19.833342" | |
} | |
Per-token loss scaled by world size: 0.00016448293172288686 | |
Per-token loss scaled by world size: 0.00010030974954133853Per-token loss scaled by world size: 0.0006337311351671815Per-token loss scaled by world size: 0.0002874261699616909Per-token loss scaled by world size: 0.0004495856410358101Per-token loss scaled by world size: 0.0012448193738237023 | |
Per-token loss scaled by world size: 8.349026757059619e-05 | |
Epoch: 7, Step: 92, Rank: 0, loss = 0.013878247700631618 | |
Epoch: 7, Step: 92, Rank: 4, loss = 0.024251583963632584 | |
Epoch: 7, Step: 92, Rank: 7, loss = 0.008463635109364986Epoch: 7, Step: 92, Rank: 5, loss = 0.10503163933753967 | |
Epoch: 7, Step: 92, Rank: 2, loss = 0.05347106233239174 | |
Epoch: 7, Step: 92, Rank: 6, loss = 0.007044491358101368Epoch: 7, Step: 92, Rank: 3, loss = 0.03793378919363022 | |
Per-token loss scaled by world size: 0.0010255238739773631 | |
Epoch: 7, Step: 92, Rank: 1, loss = 0.08652857691049576 | |
[2024-07-27 20:07:20,249] [INFO] [logging.py:96:log_dist] [Rank 0] step=92, skipped=0, lr=[3.989188527169749e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:20,327] [INFO] [timer.py:258:stop] epoch=0/micro_step=92/global_step=92, RunningAvgSamplesPerSec=31.708216014036797, CurrSamplesPerSec=32.48346829214222, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7,█████▋ | 8/12 [00:22<00:07, 1.92s/it] | |
"step": 92, | |
"rank": 0, | |
"loss": 0.013878247700631618, | |
"overall_throughput": 32.40163058618926, | |
"lr": 3.989188527169749e-06, | |
"cuda_mem_allocated": 22.009064197540283, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 675, | |
"batch_size": 16, | |
"total_loss": 0.04207538068294525, | |
"gradnorm": 0.6942251920700073, | |
"weight_norm": 393.4757995605469, | |
"timestamp": "2024-07-27T20:07:20.370030" | |
} | |
Per-token loss scaled by world size: 0.001151230651885271Per-token loss scaled by world size: 0.0008526312303729355Per-token loss scaled by world size: 0.00011098023969680071Per-token loss scaled by world size: 0.0004092410672456026Per-token loss scaled by world size: 0.0007324064499698579 | |
Per-token loss scaled by world size: 0.000303772249026224 | |
Per-token loss scaled by world size: 0.0005547546315938234 | |
Epoch: 7, Step: 93, Rank: 6, loss = 0.0076160188764333725Epoch: 7, Step: 93, Rank: 0, loss = 0.05851181969046593Epoch: 7, Step: 93, Rank: 1, loss = 0.02808416821062565 | |
Epoch: 7, Step: 93, Rank: 3, loss = 0.05026139318943024Epoch: 7, Step: 93, Rank: 5, loss = 0.020846370607614517Epoch: 7, Step: 93, Rank: 7, loss = 0.07900319993495941 | |
Epoch: 7, Step: 93, Rank: 2, loss = 0.038070037961006165 | |
Per-token loss scaled by world size: 0.002183598466217518 | |
Epoch: 7, Step: 93, Rank: 4, loss = 0.14984944462776184 | |
[2024-07-27 20:07:20,790] [INFO] [logging.py:96:log_dist] [Rank 0] step=93, skipped=0, lr=[3.72823641526463e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:20,868] [INFO] [timer.py:258:stop] epoch=0/micro_step=93/global_step=93, RunningAvgSamplesPerSec=31.713426964551296, CurrSamplesPerSec=32.18953148593345, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7,██████▌ | 9/12 [00:23<00:04, 1.49s/it] | |
"step": 93, | |
"rank": 0, | |
"loss": 0.05851181969046593, | |
"overall_throughput": 32.11137875033854, | |
"lr": 3.72823641526463e-06, | |
"cuda_mem_allocated": 22.001431465148926, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 549, | |
"batch_size": 16, | |
"total_loss": 0.05403030663728714, | |
"gradnorm": 2.058638572692871, | |
"weight_norm": 393.475830078125, | |
"timestamp": "2024-07-27T20:07:20.910467" | |
} | |
Per-token loss scaled by world size: 0.0005728427204303443Per-token loss scaled by world size: 0.00010026186646427959Per-token loss scaled by world size: 0.0007008212269283831Per-token loss scaled by world size: 0.001267179031856358Per-token loss scaled by world size: 0.0009045311016961932Per-token loss scaled by world size: 0.000113489935756661Per-token loss scaled by world size: 0.00015748964506201446 | |
Epoch: 7, Step: 94, Rank: 3, loss = 0.06035822629928589Epoch: 7, Step: 94, Rank: 1, loss = 0.008635053411126137 | |
Epoch: 7, Step: 94, Rank: 0, loss = 0.10913579910993576 | |
Epoch: 7, Step: 94, Rank: 2, loss = 0.04933607950806618Epoch: 7, Step: 94, Rank: 5, loss = 0.013563795946538448 | |
Epoch: 7, Step: 94, Rank: 7, loss = 0.07790274173021317 | |
Epoch: 7, Step: 94, Rank: 4, loss = 0.009774320758879185 | |
Per-token loss scaled by world size: 3.7123980291653425e-05 | |
Epoch: 7, Step: 94, Rank: 6, loss = 0.003197302808985114 | |
[2024-07-27 20:07:21,349] [INFO] [logging.py:96:log_dist] [Rank 0] step=94, skipped=0, lr=[3.4741423847583134e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:21,427] [INFO] [timer.py:258:stop] epoch=0/micro_step=94/global_step=94, RunningAvgSamplesPerSec=31.70583386752702, CurrSamplesPerSec=31.029757814905818, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7,███████▎ | 10/12 [00:23<00:02, 1.20s/it] | |
"step": 94, | |
"rank": 0, | |
"loss": 0.10913579910993576, | |
"overall_throughput": 30.958899761772514, | |
"lr": 3.4741423847583134e-06, | |
"cuda_mem_allocated": 22.00882577896118, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 689, | |
"batch_size": 16, | |
"total_loss": 0.0414879135787487, | |
"gradnorm": 0.7960036993026733, | |
"weight_norm": 393.475830078125, | |
"timestamp": "2024-07-27T20:07:21.470074" | |
} | |
Per-token loss scaled by world size: 0.00030438421526923776Per-token loss scaled by world size: 0.00036696376628242433Per-token loss scaled by world size: 0.00027681011124514043Per-token loss scaled by world size: 7.804056804161519e-05 | |
Per-token loss scaled by world size: 0.001165422610938549 | |
Per-token loss scaled by world size: 0.00014700897736474872 | |
Per-token loss scaled by world size: 0.0005056550144217908 | |
Epoch: 7, Step: 95, Rank: 7, loss = 0.02559572272002697Epoch: 7, Step: 95, Rank: 1, loss = 0.005443329457193613 | |
Epoch: 7, Step: 95, Rank: 6, loss = 0.01930750533938408 | |
Epoch: 7, Step: 95, Rank: 0, loss = 0.021230798214673996 | |
Epoch: 7, Step: 95, Rank: 3, loss = 0.08128822594881058 | |
Epoch: 7, Step: 95, Rank: 5, loss = 0.03526943549513817 | |
Epoch: 7, Step: 95, Rank: 4, loss = 0.010253876447677612 | |
Per-token loss scaled by world size: 0.001073820167221129 | |
Epoch: 7, Step: 95, Rank: 2, loss = 0.07489895820617676 | |
[2024-07-27 20:07:21,879] [INFO] [logging.py:96:log_dist] [Rank 0] step=95, skipped=0, lr=[3.2271842837425917e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:21,957] [INFO] [timer.py:258:stop] epoch=0/micro_step=95/global_step=95, RunningAvgSamplesPerSec=31.718594052770808, CurrSamplesPerSec=32.93815904428149, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7,████████▏| 11/12 [00:24<00:00, 1.00it/s] | |
"step": 95, | |
"rank": 0, | |
"loss": 0.021230798214673996, | |
"overall_throughput": 32.85461233006348, | |
"lr": 3.2271842837425917e-06, | |
"cuda_mem_allocated": 22.00023889541626, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 558, | |
"batch_size": 16, | |
"total_loss": 0.034160979092121124, | |
"gradnorm": 0.7562242150306702, | |
"weight_norm": 393.475830078125, | |
"timestamp": "2024-07-27T20:07:21.999178" | |
} | |
Per-token loss scaled by world size: 0.0006009905482642353Per-token loss scaled by world size: 0.0001355827844236046Per-token loss scaled by world size: 0.0003012406814377755Per-token loss scaled by world size: 0.0010038167238235474Per-token loss scaled by world size: 0.0006891617667861283 | |
Per-token loss scaled by world size: 0.0006996800657361746Per-token loss scaled by world size: 0.0006351979682222009 | |
Epoch: 7, Step: 96, Rank: 5, loss = 0.0112872663885355Epoch: 7, Step: 96, Rank: 1, loss = 0.08356773853302002 | |
Epoch: 7, Step: 96, Rank: 0, loss = 0.05003246292471886Epoch: 7, Step: 96, Rank: 2, loss = 0.025078287348151207 | |
Epoch: 7, Step: 96, Rank: 3, loss = 0.057372719049453735 | |
Epoch: 7, Step: 96, Rank: 7, loss = 0.05288023129105568 | |
Epoch: 7, Step: 96, Rank: 6, loss = 0.0582483634352684 | |
Per-token loss scaled by world size: 0.0005330585991032422 | |
Epoch: 7, Step: 96, Rank: 4, loss = 0.04437712952494621 | |
[2024-07-27 20:07:22,425] [INFO] [logging.py:96:log_dist] [Rank 0] step=96, skipped=0, lr=[2.9876321572751143e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:22,503] [INFO] [timer.py:258:stop] epoch=0/micro_step=96/global_step=96, RunningAvgSamplesPerSec=31.71950152323151, CurrSamplesPerSec=31.804123848141387, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 7,█████████| 12/12 [00:24<00:00, 1.16it/s] | |
"step": 96, | |
"rank": 0, | |
"loss": 0.05003246292471886, | |
"overall_throughput": 31.730529437408332, | |
"lr": 2.9876321572751143e-06, | |
"cuda_mem_allocated": 22.00548553466797, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 666, | |
"batch_size": 16, | |
"total_loss": 0.047855526208877563, | |
"gradnorm": 1.0569850206375122, | |
"weight_norm": 393.475830078125, | |
"timestamp": "2024-07-27T20:07:22.546236" | |
} | |
Epoch 7: 100%|██████████| 12/12 [00:24<00:00, 2.08s/it] | |
total tokens: 166 num samples: 2 num padding tokens: 3 - rank: 6 max len: 83 min len: 80 avg len: 81.5 num_loss_counted_tokens: 88 total tokens: 174 num samples: 2 num padding tokens: 37 - rank: 6 max len: 87 min len: 50 avg len: 68.5 num_loss_counted_tokens: 83 | |
total tokens: 132 num samples: 2 num padding tokens: 17 - rank: 6 max len: 66 min len: 49 avg len: 57.5 num_loss_counted_tokens: 50 | |
total tokens: 194 num samples: 2 num padding tokens: 18 - rank: 6 max len: 97 min len: 79 avg len: 88.0 num_loss_counted_tokens: 112 total tokens: 140 num samples: 2 num padding tokens: 4 - rank: 6 max len: 70 min len: 66 avg len: 68.0 num_loss_counted_tokens: 72 | |
total tokens: 140 num samples: 2 num padding tokens: 6 - rank: 6 max len: 70 min len: 64 avg len: 67.0 num_loss_counted_tokens: 81 | |
total tokens: 134 num samples: 2 num padding tokens: 16 - rank: 6 max len: 67 min len: 51 avg len: 59.0 num_loss_counted_tokens: 48 | |
total tokens: 144 num samples: 2 num padding tokens: 18 - rank: 6 max len: 72 min len: 54 avg len: 63.0 num_loss_counted_tokens: 82 | |
total tokens: 154 num samples: 2 num padding tokens: 19 - rank: 3 max len: 77 min len: 58 avg len: 67.5 num_loss_counted_tokens: 87 | |
total tokens: 196 num samples: 2 num padding tokens: 34 - rank: 6 max len: 98 min len: 64 avg len: 81.0 num_loss_counted_tokens: 113 | |
total tokens: 228 num samples: 2 num padding tokens: 70 - rank: 6 max len: 114 min len: 44 avg len: 79.0 num_loss_counted_tokens: 110 | |
total tokens: 126 num samples: 2 num padding tokens: 1 - rank: 3 max len: 63 min len: 62 avg len: 62.5 num_loss_counted_tokens: 60 | |
total tokens: 132 num samples: 2 num padding tokens: 6 - rank: 3 max len: 66 min len: 60 avg len: 63.0 num_loss_counted_tokens: 63 | |
total tokens: 186 num samples: 2 num padding tokens: 48 - rank: 3 max len: 93 min len: 45 avg len: 69.0 num_loss_counted_tokens: 110 | |
total tokens: 126 num samples: 2 num padding tokens: 8 - rank: 3 max len: 63 min len: 55 avg len: 59.0 num_loss_counted_tokens: 61 | |
total tokens: 104 num samples: 2 num padding tokens: 1 - rank: 3 max len: 52 min len: 51 avg len: 51.5 num_loss_counted_tokens: 59 | |
total tokens: 152 num samples: 2 num padding tokens: 2 - rank: 3 max len: 76 min len: 74 avg len: 75.0 num_loss_counted_tokens: 79 | |
total tokens: 108 num samples: 2 num padding tokens: 1 - rank: 3 max len: 54 min len: 53 avg len: 53.5 num_loss_counted_tokens: 64 | |
total tokens: 120 num samples: 2 num padding tokens: 5 - rank: 7 max len: 60 min len: 55 avg len: 57.5 num_loss_counted_tokens: 61 | |
total tokens: 188 num samples: 2 num padding tokens: 35 - rank: 3 max len: 94 min len: 59 avg len: 76.5 num_loss_counted_tokens: 93 | |
total tokens: 162 num samples: 2 num padding tokens: 37 - rank: 7 max len: 81 min len: 44 avg len: 62.5 num_loss_counted_tokens: 66 | |
total tokens: 172 num samples: 2 num padding tokens: 11 - rank: 7 max len: 86 min len: 75 avg len: 80.5 num_loss_counted_tokens: 72 | |
total tokens: 166 num samples: 2 num padding tokens: 37 - rank: 7 max len: 83 min len: 46 avg len: 64.5 num_loss_counted_tokens: 80 | |
total tokens: 142 num samples: 2 num padding tokens: 6 - rank: 3 max len: 71 min len: 65 avg len: 68.0 num_loss_counted_tokens: 78 | |
total tokens: 158 num samples: 2 num padding tokens: 22 - rank: 7 max len: 79 min len: 57 avg len: 68.0 num_loss_counted_tokens: 65 | |
total tokens: 128 num samples: 2 num padding tokens: 19 - rank: 7 max len: 64 min len: 45 avg len: 54.5 num_loss_counted_tokens: 58 | |
total tokens: 126 num samples: 2 num padding tokens: 0 - rank: 7 max len: 63 min len: 63 avg len: 63.0 num_loss_counted_tokens: 64 | |
total tokens: 134 num samples: 2 num padding tokens: 24 - rank: 7 max len: 67 min len: 43 avg len: 55.0 num_loss_counted_tokens: 57 | |
total tokens: 120 num samples: 2 num padding tokens: 6 - rank: 7 max len: 60 min len: 54 avg len: 57.0 num_loss_counted_tokens: 72 | |
total tokens: 202 num samples: 2 num padding tokens: 21 - rank: 7 max len: 101 min len: 80 avg len: 90.5 num_loss_counted_tokens: 128 | |
total tokens: 138 num samples: 2 num padding tokens: 17 - rank: 6 max len: 69 min len: 52 avg len: 60.5 num_loss_counted_tokens: 60 | |
total tokens: 196 num samples: 2 num padding tokens: 53 - rank: 5 max len: 98 min len: 45 avg len: 71.5 num_loss_counted_tokens: 99 | |
total tokens: 110 num samples: 2 num padding tokens: 5 - rank: 5 max len: 55 min len: 50 avg len: 52.5 num_loss_counted_tokens: 57 | |
total tokens: 176 num samples: 2 num padding tokens: 17 - rank: 5 max len: 88 min len: 71 avg len: 79.5 num_loss_counted_tokens: 86 | |
total tokens: 120 num samples: 2 num padding tokens: 8 - rank: 5 max len: 60 min len: 52 avg len: 56.0 num_loss_counted_tokens: 63 | |
total tokens: 172 num samples: 2 num padding tokens: 18 - rank: 5 max len: 86 min len: 68 avg len: 77.0 num_loss_counted_tokens: 74 | |
total tokens: 282 num samples: 2 num padding tokens: 81 - rank: 5 max len: 141 min len: 60 avg len: 100.5 num_loss_counted_tokens: 152 | |
total tokens: 162 num samples: 2 num padding tokens: 11 - rank: 5 max len: 81 min len: 70 avg len: 75.5 num_loss_counted_tokens: 86 | |
total tokens: 166 num samples: 2 num padding tokens: 24 - rank: 5 max len: 83 min len: 59 avg len: 71.0 num_loss_counted_tokens: 80 | |
total tokens: 216 num samples: 2 num padding tokens: 47 - rank: 5 max len: 108 min len: 61 avg len: 84.5 num_loss_counted_tokens: 103 | |
total tokens: 226 num samples: 2 num padding tokens: 40 - rank: 7 max len: 113 min len: 73 avg len: 93.0 num_loss_counted_tokens: 109 | |
total tokens: 122 num samples: 2 num padding tokens: 15 - rank: 5 max len: 61 min len: 46 avg len: 53.5 num_loss_counted_tokens: 55 | |
total tokens: 180 num samples: 2 num padding tokens: 22 - rank: 4 max len: 90 min len: 68 avg len: 79.0 num_loss_counted_tokens: 111 | |
total tokens: 152 num samples: 2 num padding tokens: 16 - rank: 5 max len: 76 min len: 60 avg len: 68.0 num_loss_counted_tokens: 71 | |
total tokens: 152 num samples: 2 num padding tokens: 24 - rank: 4 max len: 76 min len: 52 avg len: 64.0 num_loss_counted_tokens: 59 | |
total tokens: 140 num samples: 2 num padding tokens: 19 - rank: 4 max len: 70 min len: 51 avg len: 60.5 num_loss_counted_tokens: 55 | |
total tokens: 102 num samples: 2 num padding tokens: 6 - rank: 3 max len: 51 min len: 45 avg len: 48.0 num_loss_counted_tokens: 50 | |
total tokens: 146 num samples: 2 num padding tokens: 1 - rank: 4 max len: 73 min len: 72 avg len: 72.5 num_loss_counted_tokens: 83 | |
total tokens: 118 num samples: 2 num padding tokens: 11 - rank: 6 max len: 59 min len: 48 avg len: 53.5 num_loss_counted_tokens: 53 | |
total tokens: 114 num samples: 2 num padding tokens: 8 - rank: 3 max len: 57 min len: 49 avg len: 53.0 num_loss_counted_tokens: 58 | |
total tokens: 148 num samples: 2 num padding tokens: 14 - rank: 4 max len: 74 min len: 60 avg len: 67.0 num_loss_counted_tokens: 74 | |
total tokens: 122 num samples: 2 num padding tokens: 6 - rank: 4 max len: 61 min len: 55 avg len: 58.0 num_loss_counted_tokens: 53 | |
total tokens: 162 num samples: 2 num padding tokens: 15 - rank: 4 max len: 81 min len: 66 avg len: 73.5 num_loss_counted_tokens: 87 | |
total tokens: 138 num samples: 2 num padding tokens: 9 - rank: 7 max len: 69 min len: 60 avg len: 64.5 num_loss_counted_tokens: 75 | |
total tokens: 168 num samples: 2 num padding tokens: 34 - rank: 4 max len: 84 min len: 50 avg len: 67.0 num_loss_counted_tokens: 88 | |
total tokens: 160 num samples: 2 num padding tokens: 2 - rank: 4 max len: 80 min len: 78 avg len: 79.0 num_loss_counted_tokens: 91 | |
total tokens: 128 num samples: 2 num padding tokens: 20 - rank: 4 max len: 64 min len: 44 avg len: 54.0 num_loss_counted_tokens: 47 | |
total tokens: 186 num samples: 2 num padding tokens: 3 - rank: 4 max len: 93 min len: 90 avg len: 91.5 num_loss_counted_tokens: 114 | |
total tokens: 126 num samples: 2 num padding tokens: 3 - rank: 5 max len: 63 min len: 60 avg len: 61.5 num_loss_counted_tokens: 65 | |
total tokens: 136 num samples: 2 num padding tokens: 15 - rank: 4 max len: 68 min len: 53 avg len: 60.5 num_loss_counted_tokens: 52 | |
total tokens: 104 num samples: 2 num padding tokens: 4 - rank: 0 max len: 52 min len: 48 avg len: 50.0 num_loss_counted_tokens: 54 | |
total tokens: 124 num samples: 2 num padding tokens: 7 - rank: 0 max len: 62 min len: 55 avg len: 58.5 num_loss_counted_tokens: 51 | |
total tokens: 154 num samples: 2 num padding tokens: 13 - rank: 0 max len: 77 min len: 64 avg len: 70.5 num_loss_counted_tokens: 78 | |
total tokens: 132 num samples: 2 num padding tokens: 6 - rank: 0 max len: 66 min len: 60 avg len: 63.0 num_loss_counted_tokens: 87 | |
total tokens: 124 num samples: 2 num padding tokens: 4 - rank: 0 max len: 62 min len: 58 avg len: 60.0 num_loss_counted_tokens: 73 | |
total tokens: 118 num samples: 2 num padding tokens: 2 - rank: 1 max len: 59 min len: 57 avg len: 58.0 num_loss_counted_tokens: 60 | |
total tokens: 152 num samples: 2 num padding tokens: 27 - rank: 1 max len: 76 min len: 49 avg len: 62.5 num_loss_counted_tokens: 72 | |
total tokens: 130 num samples: 2 num padding tokens: 7 - rank: 1 max len: 65 min len: 58 avg len: 61.5 num_loss_counted_tokens: 59 | |
total tokens: 188 num samples: 2 num padding tokens: 34 - rank: 0 max len: 94 min len: 60 avg len: 77.0 num_loss_counted_tokens: 90 | |
total tokens: 164 num samples: 2 num padding tokens: 12 - rank: 1 max len: 82 min len: 70 avg len: 76.0 num_loss_counted_tokens: 96 | |
total tokens: 124 num samples: 2 num padding tokens: 4 - rank: 0 max len: 62 min len: 58 avg len: 60.0 num_loss_counted_tokens: 60 | |
total tokens: 244 num samples: 2 num padding tokens: 67 - rank: 0 max len: 122 min len: 55 avg len: 88.5 num_loss_counted_tokens: 127 | |
total tokens: 106 num samples: 2 num padding tokens: 7 - rank: 0 max len: 53 min len: 46 avg len: 49.5 num_loss_counted_tokens: 45 | |
total tokens: 128 num samples: 2 num padding tokens: 0 - rank: 1 max len: 64 min len: 64 avg len: 64.0 num_loss_counted_tokens: 59 | |
total tokens: 110 num samples: 2 num padding tokens: 5 - rank: 0 max len: 55 min len: 50 avg len: 52.5 num_loss_counted_tokens: 57 | |
total tokens: 132 num samples: 2 num padding tokens: 5 - rank: 1 max len: 66 min len: 61 avg len: 63.5 num_loss_counted_tokens: 60 | |
total tokens: 208 num samples: 2 num padding tokens: 33 - rank: 0 max len: 104 min len: 71 avg len: 87.5 num_loss_counted_tokens: 124 | |
total tokens: 174 num samples: 2 num padding tokens: 20 - rank: 1 max len: 87 min len: 67 avg len: 77.0 num_loss_counted_tokens: 86 | |
total tokens: 154 num samples: 2 num padding tokens: 10 - rank: 1 max len: 77 min len: 67 avg len: 72.0 num_loss_counted_tokens: 68 | |
total tokens: 132 num samples: 2 num padding tokens: 18 - rank: 1 max len: 66 min len: 48 avg len: 57.0 num_loss_counted_tokens: 68 | |
total tokens: 126 num samples: 2 num padding tokens: 2 - rank: 1 max len: 63 min len: 61 avg len: 62.0 num_loss_counted_tokens: 68 | |
total tokens: 138 num samples: 2 num padding tokens: 8 - rank: 1 max len: 69 min len: 61 avg len: 65.0 num_loss_counted_tokens: 58 | |
total tokens: 146 num samples: 2 num padding tokens: 2 - rank: 2 max len: 73 min len: 71 avg len: 72.0 num_loss_counted_tokens: 79 total tokens: 118 num samples: 2 num padding tokens: 1 - rank: 2 max len: 59 min len: 58 avg len: 58.5 num_loss_counted_tokens: 64 | |
total tokens: 124 num samples: 2 num padding tokens: 10 - rank: 2 max len: 62 min len: 52 avg len: 57.0 num_loss_counted_tokens: 59 | |
total tokens: 180 num samples: 2 num padding tokens: 31 - rank: 0 max len: 90 min len: 59 avg len: 74.5 num_loss_counted_tokens: 104 | |
total tokens: 186 num samples: 2 num padding tokens: 1 - rank: 2 max len: 93 min len: 92 avg len: 92.5 num_loss_counted_tokens: 125 | |
total tokens: 136 num samples: 2 num padding tokens: 5 - rank: 2 max len: 68 min len: 63 avg len: 65.5 num_loss_counted_tokens: 50 | |
total tokens: 214 num samples: 2 num padding tokens: 31 - rank: 2 max len: 107 min len: 76 avg len: 91.5 num_loss_counted_tokens: 132 | |
total tokens: 174 num samples: 2 num padding tokens: 5 - rank: 2 max len: 87 min len: 82 avg len: 84.5 num_loss_counted_tokens: 109 | |
total tokens: 168 num samples: 2 num padding tokens: 29 - rank: 2 max len: 84 min len: 55 avg len: 69.5 num_loss_counted_tokens: 81 | |
total tokens: 116 num samples: 2 num padding tokens: 3 - rank: 2 max len: 58 min len: 55 avg len: 56.5 num_loss_counted_tokens: 65 total tokens: 172 num samples: 2 num padding tokens: 27 - rank: 2 max len: 86 min len: 59 avg len: 72.5 num_loss_counted_tokens: 75 | |
total tokens: 200 num samples: 2 num padding tokens: 30 - rank: 1 max len: 100 min len: 70 avg len: 85.0 num_loss_counted_tokens: 92 | |
total tokens: 142 num samples: 2 num padding tokens: 9 - rank: 2 max len: 71 min len: 62 avg len: 66.5 num_loss_counted_tokens: 62 | |
total tokens: 214 num samples: 2 num padding tokens: 49 - rank: 2 max len: 107 min len: 58 avg len: 82.5 num_loss_counted_tokens: 102 | |
Per-token loss scaled by world size: 0.0008622364257462323Per-token loss scaled by world size: 7.275798998307437e-05Per-token loss scaled by world size: 0.00035221243160776794 | |
Per-token loss scaled by world size: 0.0006777397356927395 | |
Per-token loss scaled by world size: 0.0003756655496545136 | |
Per-token loss scaled by world size: 0.0009425911703146994 | |
Per-token loss scaled by world size: 0.0004384875064715743 | |
Epoch: 8, Step: 97, Rank: 1, loss = 0.061649903655052185 | |
Epoch: 8, Step: 97, Rank: 0, loss = 0.005202196538448334 | |
Epoch: 8, Step: 97, Rank: 4, loss = 0.025183189660310745 | |
Epoch: 8, Step: 97, Rank: 5, loss = 0.048458389937877655 | |
Epoch: 8, Step: 97, Rank: 3, loss = 0.026860086247324944 | |
Epoch: 8, Step: 97, Rank: 7, loss = 0.06739526987075806 | |
Epoch: 8, Step: 97, Rank: 6, loss = 0.031351856887340546 | |
Per-token loss scaled by world size: 0.0008824544493108988 | |
Epoch: 8, Step: 97, Rank: 2, loss = 0.06309549510478973 | |
[2024-07-27 20:07:23,455] [INFO] [logging.py:96:log_dist] [Rank 0] step=97, skipped=0, lr=[2.7557479520891104e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:23,531] [INFO] [timer.py:258:stop] epoch=0/micro_step=97/global_step=97, RunningAvgSamplesPerSec=31.709145954932833, CurrSamplesPerSec=30.76501430086227, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 8: 8%|▊ | 1/12 [00:00<00:10, 1.07it/s]{ | |
"epoch": 8, | |
"step": 97, | |
"rank": 0, | |
"loss": 0.005202196538448334, | |
"overall_throughput": 30.654808948925602, | |
"lr": 2.7557479520891104e-06, | |
"cuda_mem_allocated": 22.001669883728027, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 572, | |
"batch_size": 16, | |
"total_loss": 0.041149549186229706, | |
"gradnorm": 0.7585266828536987, | |
"weight_norm": 393.475830078125, | |
"timestamp": "2024-07-27T20:07:23.574672" | |
} | |
Per-token loss scaled by world size: 0.00014853040920570493Per-token loss scaled by world size: 0.0014420171501114964Per-token loss scaled by world size: 0.00023432534362655133Per-token loss scaled by world size: 0.0010252870852127671Per-token loss scaled by world size: 0.00022051780251786113Per-token loss scaled by world size: 0.0014104293659329414Per-token loss scaled by world size: 0.0005214783013798296 | |
Epoch: 8, Step: 98, Rank: 5, loss = 0.10472649335861206Epoch: 8, Step: 98, Rank: 6, loss = 0.017017878592014313 | |
Epoch: 8, Step: 98, Rank: 7, loss = 0.01078702136874199 | |
Epoch: 8, Step: 98, Rank: 4, loss = 0.07446147501468658Epoch: 8, Step: 98, Rank: 1, loss = 0.016015104949474335Epoch: 8, Step: 98, Rank: 3, loss = 0.10243242979049683Epoch: 8, Step: 98, Rank: 2, loss = 0.03787236288189888 | |
Per-token loss scaled by world size: 0.0007809263770468533 | |
Epoch: 8, Step: 98, Rank: 0, loss = 0.056714776903390884 | |
[2024-07-27 20:07:23,998] [INFO] [logging.py:96:log_dist] [Rank 0] step=98, skipped=0, lr=[2.5317852301584642e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:24,076] [INFO] [timer.py:258:stop] epoch=0/micro_step=98/global_step=98, RunningAvgSamplesPerSec=31.716539168080143, CurrSamplesPerSec=32.43497139719714, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 8: 17%|█▋ | 2/12 [00:01<00:07, 1.42it/s]{ | |
"epoch": 8, | |
"step": 98, | |
"rank": 0, | |
"loss": 0.056714776903390884, | |
"overall_throughput": 32.37466676829651, | |
"lr": 2.5317852301584642e-06, | |
"cuda_mem_allocated": 21.996094703674316, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 581, | |
"batch_size": 16, | |
"total_loss": 0.052503444254398346, | |
"gradnorm": 0.8310177326202393, | |
"weight_norm": 393.475830078125, | |
"timestamp": "2024-07-27T20:07:24.122854" | |
} | |
Per-token loss scaled by world size: 1.7735277651809156e-05Per-token loss scaled by world size: 0.0005863551050424576Per-token loss scaled by world size: 0.0004776878922712058Per-token loss scaled by world size: 0.00042502893484197557Per-token loss scaled by world size: 0.000244389520958066 | |
Per-token loss scaled by world size: 4.276382242096588e-05Per-token loss scaled by world size: 0.00011345247185090557 | |
Epoch: 8, Step: 99, Rank: 3, loss = 0.03155839815735817Epoch: 8, Step: 99, Rank: 2, loss = 0.04353686794638634 | |
Epoch: 8, Step: 99, Rank: 1, loss = 0.03546832501888275Epoch: 8, Step: 99, Rank: 0, loss = 0.0013168443692848086 | |
Epoch: 8, Step: 99, Rank: 4, loss = 0.01814592257142067 | |
Epoch: 8, Step: 99, Rank: 6, loss = 0.003175213700160384 | |
Epoch: 8, Step: 99, Rank: 7, loss = 0.00842384621500969 | |
Per-token loss scaled by world size: 0.0002905700821429491 | |
Epoch: 8, Step: 99, Rank: 5, loss = 0.021574828773736954 | |
[2024-07-27 20:07:24,553] [INFO] [logging.py:96:log_dist] [Rank 0] step=99, skipped=0, lr=[2.315988891431412e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:24,631] [INFO] [timer.py:258:stop] epoch=0/micro_step=99/global_step=99, RunningAvgSamplesPerSec=31.71474915120891, CurrSamplesPerSec=31.54384320597289, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Saving model in huggingface format at samples_seen: 1584 | |
{ | |
"epoch": 8, | |
"step": 99, | |
"rank": 0, | |
"loss": 0.0013168443692848086, | |
"overall_throughput": 31.43285568885302, | |
"lr": 2.315988891431412e-06, | |
"cuda_mem_allocated": 21.998091220855713, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 594, | |
"batch_size": 16, | |
"total_loss": 0.02040003053843975, | |
"gradnorm": 0.5513115525245667, | |
"weight_norm": 393.475830078125, | |
"timestamp": "2024-07-27T20:07:24.635155" | |
} | |
Model saved in /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_1584 | |
[20:07:42] INFO saving took 17.857797861099243 seconds utils.py:611 | |
Epoch 8: 25%|██▌ | 3/12 [00:19<01:19, 8.79s/it]Per-token loss scaled by world size: 0.00023032784520182759Per-token loss scaled by world size: 0.0005766816902905703Per-token loss scaled by world size: 0.00042750773718580604 | |
Per-token loss scaled by world size: 0.0003948273661080748Per-token loss scaled by world size: 0.00027044868329539895Per-token loss scaled by world size: 0.00016059860354289412Per-token loss scaled by world size: 1.1637920579232741e-05 | |
Epoch: 8, Step: 100, Rank: 0, loss = 0.020211268216371536 | |
Epoch: 8, Step: 100, Rank: 3, loss = 0.05060381814837456 | |
Epoch: 8, Step: 100, Rank: 1, loss = 0.03464610129594803 | |
Epoch: 8, Step: 100, Rank: 2, loss = 0.03751380369067192Epoch: 8, Step: 100, Rank: 4, loss = 0.02373187243938446Epoch: 8, Step: 100, Rank: 7, loss = 0.00102122756652534 | |
Epoch: 8, Step: 100, Rank: 5, loss = 0.014092527329921722 | |
Per-token loss scaled by world size: 3.5370929253986105e-05 | |
Epoch: 8, Step: 100, Rank: 6, loss = 0.0031037991866469383 | |
[2024-07-27 20:07:42,968] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=0, lr=[2.1085949060360654e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:43,046] [INFO] [timer.py:258:stop] epoch=0/micro_step=100/global_step=100, RunningAvgSamplesPerSec=31.71639245876496, CurrSamplesPerSec=31.876606801027897, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 8: 33%|███▎ | 4/12 [00:20<00:44, 5.54s/it]{ | |
"epoch": 8, | |
"step": 100, | |
"rank": 0, | |
"loss": 0.020211268216371536, | |
"overall_throughput": 31.806867290460243, | |
"lr": 2.1085949060360654e-06, | |
"cuda_mem_allocated": 21.999046802520752, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 702, | |
"batch_size": 16, | |
"total_loss": 0.023115552961826324, | |
"gradnorm": 0.8944979310035706, | |
"weight_norm": 393.4758605957031, | |
"timestamp": "2024-07-27T20:07:43.089296" | |
} | |
Per-token loss scaled by world size: 0.002073504263535142Per-token loss scaled by world size: 0.0009401860297657549Per-token loss scaled by world size: 0.0007578051881864667Per-token loss scaled by world size: 0.00018321115931030363Per-token loss scaled by world size: 0.00033954239916056395 | |
Per-token loss scaled by world size: 0.00022701223497278988 | |
Per-token loss scaled by world size: 0.00016321164730470628 | |
Epoch: 8, Step: 101, Rank: 0, loss = 0.1342594027519226 | |
Epoch: 8, Step: 101, Rank: 4, loss = 0.04906788468360901 | |
Epoch: 8, Step: 101, Rank: 5, loss = 0.02198537066578865Epoch: 8, Step: 101, Rank: 2, loss = 0.011862922459840775Epoch: 8, Step: 101, Rank: 6, loss = 0.014699041843414307 | |
Epoch: 8, Step: 101, Rank: 1, loss = 0.06087704375386238 | |
Epoch: 8, Step: 101, Rank: 7, loss = 0.010567953810095787 | |
Per-token loss scaled by world size: 0.00039684344665147364 | |
Epoch: 8, Step: 101, Rank: 3, loss = 0.025695612654089928 | |
[2024-07-27 20:07:43,515] [INFO] [logging.py:96:log_dist] [Rank 0] step=101, skipped=0, lr=[1.9098300562505266e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:43,592] [INFO] [timer.py:258:stop] epoch=0/micro_step=101/global_step=101, RunningAvgSamplesPerSec=31.718083745763387, CurrSamplesPerSec=31.884709476489913, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 8: 42%|████▏ | 5/12 [00:20<00:26, 3.74s/it]{ | |
"epoch": 8, | |
"step": 101, | |
"rank": 0, | |
"loss": 0.1342594027519226, | |
"overall_throughput": 31.800326016907388, | |
"lr": 1.9098300562505266e-06, | |
"cuda_mem_allocated": 21.998091220855713, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 518, | |
"batch_size": 16, | |
"total_loss": 0.041126906871795654, | |
"gradnorm": 0.7889791131019592, | |
"weight_norm": 393.4758605957031, | |
"timestamp": "2024-07-27T20:07:43.642554" | |
} | |
Per-token loss scaled by world size: 0.00021082548482809216Per-token loss scaled by world size: 0.0001290614891331643Per-token loss scaled by world size: 0.0004096523334737867Per-token loss scaled by world size: 0.00011912157060578465 | |
Per-token loss scaled by world size: 0.00024137772561516613 | |
Per-token loss scaled by world size: 0.00010579593799775466 | |
Per-token loss scaled by world size: 0.00036702080979011953 | |
Epoch: 8, Step: 102, Rank: 2, loss = 0.011470340192317963Epoch: 8, Step: 102, Rank: 1, loss = 0.03640785068273544 | |
Epoch: 8, Step: 102, Rank: 0, loss = 0.01873711496591568 | |
Epoch: 8, Step: 102, Rank: 4, loss = 0.009402614086866379Epoch: 8, Step: 102, Rank: 3, loss = 0.010586929507553577 | |
Epoch: 8, Step: 102, Rank: 6, loss = 0.03261897340416908Epoch: 8, Step: 102, Rank: 5, loss = 0.021452445536851883 | |
Per-token loss scaled by world size: 0.0002314479643246159 | |
Epoch: 8, Step: 102, Rank: 7, loss = 0.0205699373036623 | |
[2024-07-27 20:07:44,058] [INFO] [logging.py:96:log_dist] [Rank 0] step=102, skipped=0, lr=[1.7199116885197996e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:44,136] [INFO] [timer.py:258:stop] epoch=0/micro_step=102/global_step=102, RunningAvgSamplesPerSec=31.727033550532532, CurrSamplesPerSec=32.638783565843816, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
Epoch 8: 50%|█████ | 6/12 [00:21<00:15, 2.65s/it]{ | |
"epoch": 8, | |
"step": 102, | |
"rank": 0, | |
"loss": 0.01873711496591568, | |
"overall_throughput": 32.58409397351569, | |
"lr": 1.7199116885197996e-06, | |
"cuda_mem_allocated": 22.00572395324707, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 711, | |
"batch_size": 16, | |
"total_loss": 0.0201557744294405, | |
"gradnorm": 0.5014692544937134, | |
"weight_norm": 393.4758605957031, | |
"timestamp": "2024-07-27T20:07:44.178902" | |
} | |
Per-token loss scaled by world size: 0.0001110902740038Per-token loss scaled by world size: 0.0010643235873430967Per-token loss scaled by world size: 0.00038977997610345483Per-token loss scaled by world size: 0.0005851351888850331Per-token loss scaled by world size: 0.0006453694077208638Per-token loss scaled by world size: 2.1985697458148934e-05 | |
Per-token loss scaled by world size: 0.0008211812237277627 | |
Epoch: 8, Step: 103, Rank: 6, loss = 0.0468108169734478Epoch: 8, Step: 103, Rank: 1, loss = 0.08514588326215744 | |
Epoch: 8, Step: 103, Rank: 5, loss = 0.03118239715695381Epoch: 8, Step: 103, Rank: 2, loss = 0.051629554480314255 | |
Epoch: 8, Step: 103, Rank: 0, loss = 0.008887222036719322 | |
Epoch: 8, Step: 103, Rank: 7, loss = 0.0017588557675480843 | |
Epoch: 8, Step: 103, Rank: 3, loss = 0.06569449603557587 | |
Per-token loss scaled by world size: 0.0005115483654662967 | |
Epoch: 8, Step: 103, Rank: 4, loss = 0.04092387109994888 | |
[2024-07-27 20:07:44,598] [INFO] [logging.py:96:log_dist] [Rank 0] step=103, skipped=0, lr=[1.5390474757906449e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:44,676] [INFO] [timer.py:258:stop] epoch=0/micro_step=103/global_step=103, RunningAvgSamplesPerSec=31.73624582311991, CurrSamplesPerSec=32.68529726054485, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
Epoch 8: 58%|█████▊ | 7/12 [00:22<00:09, 1.96s/it]{ | |
"epoch": 8, | |
"step": 103, | |
"rank": 0, | |
"loss": 0.008887222036719322, | |
"overall_throughput": 32.6317687389074, | |
"lr": 1.5390474757906449e-06, | |
"cuda_mem_allocated": 22.01240301132202, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 640, | |
"batch_size": 16, | |
"total_loss": 0.0415041409432888, | |
"gradnorm": 0.7137126326560974, | |
"weight_norm": 393.47589111328125, | |
"timestamp": "2024-07-27T20:07:44.718888" | |
} | |
Per-token loss scaled by world size: 0.000524764705915004Per-token loss scaled by world size: 0.00015330749738495797Per-token loss scaled by world size: 0.001214228686876595Per-token loss scaled by world size: 0.00014493752678390592Per-token loss scaled by world size: 0.0008454297785647213Per-token loss scaled by world size: 0.0007223724969662726 | |
Per-token loss scaled by world size: 0.0003260647936258465 | |
Epoch: 8, Step: 104, Rank: 7, loss = 0.01151722576469183Epoch: 8, Step: 104, Rank: 5, loss = 0.06351291388273239Epoch: 8, Step: 104, Rank: 4, loss = 0.01088843122124672 | |
Epoch: 8, Step: 104, Rank: 0, loss = 0.03942294791340828Epoch: 8, Step: 104, Rank: 3, loss = 0.09121893346309662 | |
Epoch: 8, Step: 104, Rank: 2, loss = 0.054268233478069305 | |
Epoch: 8, Step: 104, Rank: 1, loss = 0.024495618417859077 | |
Per-token loss scaled by world size: 0.0009017937700264156 | |
Epoch: 8, Step: 104, Rank: 6, loss = 0.06774725764989853 | |
[2024-07-27 20:07:45,141] [INFO] [logging.py:96:log_dist] [Rank 0] step=104, skipped=0, lr=[1.367435190424261e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:45,219] [INFO] [timer.py:258:stop] epoch=0/micro_step=104/global_step=104, RunningAvgSamplesPerSec=31.74047431733202, CurrSamplesPerSec=32.17343553961532, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 8: 67%|██████▋ | 8/12 [00:22<00:06, 1.51s/it]{ | |
"epoch": 8, | |
"step": 104, | |
"rank": 0, | |
"loss": 0.03942294791340828, | |
"overall_throughput": 32.1218766079758, | |
"lr": 1.367435190424261e-06, | |
"cuda_mem_allocated": 22.004770278930664, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 601, | |
"batch_size": 16, | |
"total_loss": 0.04538394883275032, | |
"gradnorm": 0.7771502137184143, | |
"weight_norm": 393.47589111328125, | |
"timestamp": "2024-07-27T20:07:45.261450" | |
} | |
Per-token loss scaled by world size: 0.0007052936707623303 | |
Per-token loss scaled by world size: 0.000300221232464537Per-token loss scaled by world size: 0.0001537334028398618Per-token loss scaled by world size: 0.0005797221674583852Per-token loss scaled by world size: 2.4881815988919698e-05Per-token loss scaled by world size: 8.731409616302699e-05 | |
Per-token loss scaled by world size: 8.151983638526872e-05 | |
Epoch: 8, Step: 105, Rank: 0, loss = 0.04989952594041824 | |
Epoch: 8, Step: 105, Rank: 6, loss = 0.010876637883484364Epoch: 8, Step: 105, Rank: 5, loss = 0.0017603884916752577 | |
Epoch: 8, Step: 105, Rank: 1, loss = 0.0061774724163115025Epoch: 8, Step: 105, Rank: 2, loss = 0.021240651607513428 | |
Epoch: 8, Step: 105, Rank: 7, loss = 0.04101534187793732 | |
Epoch: 8, Step: 105, Rank: 4, loss = 0.005767528433352709 | |
Per-token loss scaled by world size: 0.0010399594902992249 | |
Epoch: 8, Step: 105, Rank: 3, loss = 0.07357713580131531 | |
[2024-07-27 20:07:45,680] [INFO] [logging.py:96:log_dist] [Rank 0] step=105, skipped=0, lr=[1.2052624879351105e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:45,759] [INFO] [timer.py:258:stop] epoch=0/micro_step=105/global_step=105, RunningAvgSamplesPerSec=31.74492270946544, CurrSamplesPerSec=32.205303527286674, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 8: 75%|███████▌ | 9/12 [00:23<00:03, 1.21s/it]{ | |
"epoch": 8, | |
"step": 105, | |
"rank": 0, | |
"loss": 0.04989952594041824, | |
"overall_throughput": 32.11246971610297, | |
"lr": 1.2052624879351105e-06, | |
"cuda_mem_allocated": 21.996421813964844, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 566, | |
"batch_size": 16, | |
"total_loss": 0.026289334520697594, | |
"gradnorm": 0.5574781894683838, | |
"weight_norm": 393.47589111328125, | |
"timestamp": "2024-07-27T20:07:45.808243" | |
} | |
Per-token loss scaled by world size: 0.0008091902709566057Per-token loss scaled by world size: 0.0007261958089657128Per-token loss scaled by world size: 0.0015269122086465359Per-token loss scaled by world size: 0.0014011193998157978Per-token loss scaled by world size: 3.460650987108238e-05 | |
Per-token loss scaled by world size: 0.00045884415158070624Per-token loss scaled by world size: 5.5351883929688483e-05 | |
Epoch: 8, Step: 106, Rank: 0, loss = 0.05917203798890114Epoch: 8, Step: 106, Rank: 1, loss = 0.11165545880794525Epoch: 8, Step: 106, Rank: 2, loss = 0.053103066980838776 | |
Epoch: 8, Step: 106, Rank: 6, loss = 0.10245685279369354 | |
Epoch: 8, Step: 106, Rank: 3, loss = 0.0025306011084467173 | |
Epoch: 8, Step: 106, Rank: 5, loss = 0.004047606606036425Epoch: 8, Step: 106, Rank: 4, loss = 0.033552978187799454 | |
Per-token loss scaled by world size: 0.00016226798470597714 | |
Epoch: 8, Step: 106, Rank: 7, loss = 0.011865845881402493 | |
[2024-07-27 20:07:46,233] [INFO] [logging.py:96:log_dist] [Rank 0] step=106, skipped=0, lr=[1.0527067017923654e-06], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:46,311] [INFO] [timer.py:258:stop] epoch=0/micro_step=106/global_step=106, RunningAvgSamplesPerSec=31.74620798139189, CurrSamplesPerSec=31.879150748989836, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 8: 83%|████████▎ | 10/12 [00:23<00:02, 1.00s/it]{ | |
"epoch": 8, | |
"step": 106, | |
"rank": 0, | |
"loss": 0.05917203798890114, | |
"overall_throughput": 31.805977880940024, | |
"lr": 1.0527067017923654e-06, | |
"cuda_mem_allocated": 21.998091220855713, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 585, | |
"batch_size": 16, | |
"total_loss": 0.04729805514216423, | |
"gradnorm": 0.8684948086738586, | |
"weight_norm": 393.47589111328125, | |
"timestamp": "2024-07-27T20:07:46.362067" | |
} | |
Per-token loss scaled by world size: 3.8418351323343813e-05Per-token loss scaled by world size: 0.00024928394122980535Per-token loss scaled by world size: 0.0008044589194469154Per-token loss scaled by world size: 0.0006902430322952569Per-token loss scaled by world size: 0.0003151585115119815Per-token loss scaled by world size: 0.000329785660142079 | |
Per-token loss scaled by world size: 2.906994086515624e-05 | |
Epoch: 8, Step: 107, Rank: 4, loss = 0.05936089903116226Epoch: 8, Step: 107, Rank: 5, loss = 0.06918346881866455 | |
Epoch: 8, Step: 107, Rank: 0, loss = 0.0033039783593267202Epoch: 8, Step: 107, Rank: 3, loss = 0.021438419818878174 | |
Epoch: 8, Step: 107, Rank: 2, loss = 0.028361566364765167Epoch: 8, Step: 107, Rank: 6, loss = 0.02710363268852234 | |
Epoch: 8, Step: 107, Rank: 1, loss = 0.0025000148452818394 | |
Per-token loss scaled by world size: 3.5674460377776995e-05 | |
Epoch: 8, Step: 107, Rank: 7, loss = 0.0030680035706609488 | |
[2024-07-27 20:07:46,768] [INFO] [logging.py:96:log_dist] [Rank 0] step=107, skipped=0, lr=[9.09934649508375e-07], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:46,845] [INFO] [timer.py:258:stop] epoch=0/micro_step=107/global_step=107, RunningAvgSamplesPerSec=31.759325648755453, CurrSamplesPerSec=33.18541023815175, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
Epoch 8: 92%|█████████▏| 11/12 [00:24<00:00, 1.16it/s]{ | |
"epoch": 8, | |
"step": 107, | |
"rank": 0, | |
"loss": 0.0033039783593267202, | |
"overall_throughput": 33.10411670052279, | |
"lr": 9.09934649508375e-07, | |
"cuda_mem_allocated": 22.00811004638672, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 688, | |
"batch_size": 16, | |
"total_loss": 0.02678999863564968, | |
"gradnorm": 0.5617575645446777, | |
"weight_norm": 393.47589111328125, | |
"timestamp": "2024-07-27T20:07:46.887834" | |
} | |
Per-token loss scaled by world size: 0.0007775372941978276Per-token loss scaled by world size: 0.0009101248579099774Per-token loss scaled by world size: 8.433926268480718e-06Per-token loss scaled by world size: 3.585006925277412e-05Per-token loss scaled by world size: 3.69123590644449e-05 | |
Per-token loss scaled by world size: 0.0004430596309248358Per-token loss scaled by world size: 0.0002181310555897653 | |
Epoch: 8, Step: 108, Rank: 6, loss = 0.07588165998458862 | |
Epoch: 8, Step: 108, Rank: 3, loss = 0.0029889994766563177 | |
Epoch: 8, Step: 108, Rank: 7, loss = 0.03694009780883789Epoch: 8, Step: 108, Rank: 1, loss = 0.0030775677878409624 | |
Epoch: 8, Step: 108, Rank: 5, loss = 0.06482717394828796Epoch: 8, Step: 108, Rank: 2, loss = 0.0007031786371953785 | |
Epoch: 8, Step: 108, Rank: 4, loss = 0.018186677247285843 | |
Per-token loss scaled by world size: 0.0005422345129773021 | |
Epoch: 8, Step: 108, Rank: 0, loss = 0.045208804309368134 | |
[2024-07-27 20:07:47,305] [INFO] [logging.py:96:log_dist] [Rank 0] step=108, skipped=0, lr=[7.771024502261526e-07], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:47,383] [INFO] [timer.py:258:stop] epoch=0/micro_step=108/global_step=108, RunningAvgSamplesPerSec=31.765144134069732, CurrSamplesPerSec=32.38818214329323, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Epoch 8: 100%|██████████| 12/12 [00:24<00:00, 1.31it/s]{ | |
"epoch": 8, | |
"step": 108, | |
"rank": 0, | |
"loss": 0.045208804309368134, | |
"overall_throughput": 32.30654330502832, | |
"lr": 7.771024502261526e-07, | |
"cuda_mem_allocated": 21.99594497680664, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 667, | |
"batch_size": 16, | |
"total_loss": 0.03097676858305931, | |
"gradnorm": 0.6119153499603271, | |
"weight_norm": 393.47589111328125, | |
"timestamp": "2024-07-27T20:07:47.430973" | |
} | |
Epoch 8: 100%|██████████| 12/12 [00:24<00:00, 2.07s/it] | |
total tokens: 126 num samples: 2 num padding tokens: 13 - rank: 5 max len: 63 min len: 50 avg len: 56.5 num_loss_counted_tokens: 64 total tokens: 140 num samples: 2 num padding tokens: 25 - rank: 5 max len: 70 min len: 45 avg len: 57.5 num_loss_counted_tokens: 67 | |
total tokens: 188 num samples: 2 num padding tokens: 28 - rank: 5 max len: 94 min len: 66 avg len: 80.0 num_loss_counted_tokens: 82 | |
total tokens: 166 num samples: 2 num padding tokens: 24 - rank: 5 max len: 83 min len: 59 avg len: 71.0 num_loss_counted_tokens: 64 | |
total tokens: 122 num samples: 2 num padding tokens: 1 - rank: 5 max len: 61 min len: 60 avg len: 60.5 num_loss_counted_tokens: 66 | |
total tokens: 144 num samples: 2 num padding tokens: 17 - rank: 5 max len: 72 min len: 55 avg len: 63.5 num_loss_counted_tokens: 68 | |
total tokens: 164 num samples: 2 num padding tokens: 19 - rank: 5 max len: 82 min len: 63 avg len: 72.5 num_loss_counted_tokens: 78 | |
total tokens: 128 num samples: 2 num padding tokens: 3 - rank: 5 max len: 64 min len: 61 avg len: 62.5 num_loss_counted_tokens: 66 | |
total tokens: 282 num samples: 2 num padding tokens: 80 - rank: 5 max len: 141 min len: 61 avg len: 101.0 num_loss_counted_tokens: 146 | |
total tokens: 124 num samples: 2 num padding tokens: 17 - rank: 1 max len: 62 min len: 45 avg len: 53.5 num_loss_counted_tokens: 53 | |
total tokens: 136 num samples: 2 num padding tokens: 6 - rank: 4 max len: 68 min len: 62 avg len: 65.0 num_loss_counted_tokens: 57 | |
total tokens: 136 num samples: 2 num padding tokens: 13 - rank: 5 max len: 68 min len: 55 avg len: 61.5 num_loss_counted_tokens: 48 | |
total tokens: 160 num samples: 2 num padding tokens: 6 - rank: 7 max len: 80 min len: 74 avg len: 77.0 num_loss_counted_tokens: 94 | |
total tokens: 200 num samples: 2 num padding tokens: 45 - rank: 4 max len: 100 min len: 55 avg len: 77.5 num_loss_counted_tokens: 99 | |
total tokens: 96 num samples: 2 num padding tokens: 5 - rank: 5 max len: 48 min len: 43 avg len: 45.5 num_loss_counted_tokens: 39 | |
total tokens: 140 num samples: 2 num padding tokens: 22 - rank: 3 max len: 70 min len: 48 avg len: 59.0 num_loss_counted_tokens: 73 | |
total tokens: 216 num samples: 2 num padding tokens: 42 - rank: 2 max len: 108 min len: 66 avg len: 87.0 num_loss_counted_tokens: 105 | |
total tokens: 138 num samples: 2 num padding tokens: 10 - rank: 7 max len: 69 min len: 59 avg len: 64.0 num_loss_counted_tokens: 68 | |
total tokens: 104 num samples: 2 num padding tokens: 0 - rank: 1 max len: 52 min len: 52 avg len: 52.0 num_loss_counted_tokens: 50 | |
total tokens: 176 num samples: 2 num padding tokens: 24 - rank: 1 max len: 88 min len: 64 avg len: 76.0 num_loss_counted_tokens: 94 | |
total tokens: 142 num samples: 2 num padding tokens: 13 - rank: 3 max len: 71 min len: 58 avg len: 64.5 num_loss_counted_tokens: 69 | |
total tokens: 128 num samples: 2 num padding tokens: 5 - rank: 3 max len: 64 min len: 59 avg len: 61.5 num_loss_counted_tokens: 71 | |
total tokens: 186 num samples: 2 num padding tokens: 3 - rank: 7 max len: 93 min len: 90 avg len: 91.5 num_loss_counted_tokens: 173 | |
total tokens: 168 num samples: 2 num padding tokens: 18 - rank: 1 max len: 84 min len: 66 avg len: 75.0 num_loss_counted_tokens: 96 | |
total tokens: 136 num samples: 2 num padding tokens: 6 - rank: 6 max len: 68 min len: 62 avg len: 65.0 num_loss_counted_tokens: 57 | |
total tokens: 128 num samples: 2 num padding tokens: 4 - rank: 6 max len: 64 min len: 60 avg len: 62.0 num_loss_counted_tokens: 64 | |
total tokens: 186 num samples: 2 num padding tokens: 14 - rank: 6 max len: 93 min len: 79 avg len: 86.0 num_loss_counted_tokens: 90 | |
total tokens: 168 num samples: 2 num padding tokens: 31 - rank: 6 max len: 84 min len: 53 avg len: 68.5 num_loss_counted_tokens: 73 | |
total tokens: 138 num samples: 2 num padding tokens: 6 - rank: 1 max len: 69 min len: 63 avg len: 66.0 num_loss_counted_tokens: 78 | |
total tokens: 114 num samples: 2 num padding tokens: 6 - rank: 1 max len: 57 min len: 51 avg len: 54.0 num_loss_counted_tokens: 56 | |
total tokens: 172 num samples: 2 num padding tokens: 27 - rank: 6 max len: 86 min len: 59 avg len: 72.5 num_loss_counted_tokens: 80 total tokens: 146 num samples: 2 num padding tokens: 27 - rank: 3 max len: 73 min len: 46 avg len: 59.5 num_loss_counted_tokens: 65 | |
total tokens: 116 num samples: 2 num padding tokens: 5 - rank: 2 max len: 58 min len: 53 avg len: 55.5 num_loss_counted_tokens: 64 | |
total tokens: 134 num samples: 2 num padding tokens: 16 - rank: 1 max len: 67 min len: 51 avg len: 59.0 num_loss_counted_tokens: 56 | |
total tokens: 120 num samples: 2 num padding tokens: 0 - rank: 4 max len: 60 min len: 60 avg len: 60.0 num_loss_counted_tokens: 73 | |
total tokens: 128 num samples: 2 num padding tokens: 18 - rank: 4 max len: 64 min len: 46 avg len: 55.0 num_loss_counted_tokens: 59 | |
total tokens: 124 num samples: 2 num padding tokens: 2 - rank: 6 max len: 62 min len: 60 avg len: 61.0 num_loss_counted_tokens: 73 | |
total tokens: 128 num samples: 2 num padding tokens: 5 - rank: 2 max len: 64 min len: 59 avg len: 61.5 num_loss_counted_tokens: 52 | |
total tokens: 152 num samples: 2 num padding tokens: 11 - rank: 5 max len: 76 min len: 65 avg len: 70.5 num_loss_counted_tokens: 79 | |
total tokens: 172 num samples: 2 num padding tokens: 26 - rank: 4 max len: 86 min len: 60 avg len: 73.0 num_loss_counted_tokens: 84 | |
total tokens: 142 num samples: 2 num padding tokens: 0 - rank: 4 max len: 71 min len: 71 avg len: 71.0 num_loss_counted_tokens: 74 | |
total tokens: 146 num samples: 2 num padding tokens: 16 - rank: 6 max len: 73 min len: 57 avg len: 65.0 num_loss_counted_tokens: 72 | |
total tokens: 154 num samples: 2 num padding tokens: 28 - rank: 4 max len: 77 min len: 49 avg len: 63.0 num_loss_counted_tokens: 80 | |
total tokens: 186 num samples: 2 num padding tokens: 41 - rank: 4 max len: 93 min len: 52 avg len: 72.5 num_loss_counted_tokens: 99 | |
total tokens: 226 num samples: 2 num padding tokens: 35 - rank: 2 max len: 113 min len: 78 avg len: 95.5 num_loss_counted_tokens: 114 | |
total tokens: 166 num samples: 2 num padding tokens: 17 - rank: 2 max len: 83 min len: 66 avg len: 74.5 num_loss_counted_tokens: 86 | |
total tokens: 150 num samples: 2 num padding tokens: 9 - rank: 2 max len: 75 min len: 66 avg len: 70.5 num_loss_counted_tokens: 82 | |
total tokens: 110 num samples: 2 num padding tokens: 3 - rank: 3 max len: 55 min len: 52 avg len: 53.5 num_loss_counted_tokens: 63 | |
total tokens: 116 num samples: 2 num padding tokens: 3 - rank: 6 max len: 58 min len: 55 avg len: 56.5 num_loss_counted_tokens: 52 | |
total tokens: 116 num samples: 2 num padding tokens: 7 - rank: 4 max len: 58 min len: 51 avg len: 54.5 num_loss_counted_tokens: 59 | |
total tokens: 194 num samples: 2 num padding tokens: 27 - rank: 4 max len: 97 min len: 70 avg len: 83.5 num_loss_counted_tokens: 113 | |
total tokens: 120 num samples: 2 num padding tokens: 6 - rank: 6 max len: 60 min len: 54 avg len: 57.0 num_loss_counted_tokens: 60 | |
total tokens: 124 num samples: 2 num padding tokens: 8 - rank: 2 max len: 62 min len: 54 avg len: 58.0 num_loss_counted_tokens: 64 | |
total tokens: 152 num samples: 2 num padding tokens: 27 - rank: 7 max len: 76 min len: 49 avg len: 62.5 num_loss_counted_tokens: 64 | |
total tokens: 174 num samples: 2 num padding tokens: 20 - rank: 1 max len: 87 min len: 67 avg len: 77.0 num_loss_counted_tokens: 82 | |
total tokens: 208 num samples: 2 num padding tokens: 34 - rank: 2 max len: 104 min len: 70 avg len: 87.0 num_loss_counted_tokens: 107 | |
total tokens: 180 num samples: 2 num padding tokens: 30 - rank: 1 max len: 90 min len: 60 avg len: 75.0 num_loss_counted_tokens: 97 | |
total tokens: 172 num samples: 2 num padding tokens: 42 - rank: 1 max len: 86 min len: 44 avg len: 65.0 num_loss_counted_tokens: 68 | |
total tokens: 152 num samples: 2 num padding tokens: 8 - rank: 4 max len: 76 min len: 68 avg len: 72.0 num_loss_counted_tokens: 69 | |
total tokens: 162 num samples: 2 num padding tokens: 18 - rank: 2 max len: 81 min len: 63 avg len: 72.0 num_loss_counted_tokens: 79 | |
total tokens: 126 num samples: 2 num padding tokens: 13 - rank: 3 max len: 63 min len: 50 avg len: 56.5 num_loss_counted_tokens: 57 | |
total tokens: 214 num samples: 2 num padding tokens: 37 - rank: 2 max len: 107 min len: 70 avg len: 88.5 num_loss_counted_tokens: 99 | |
total tokens: 124 num samples: 2 num padding tokens: 12 - rank: 3 max len: 62 min len: 50 avg len: 56.0 num_loss_counted_tokens: 54 | |
total tokens: 114 num samples: 2 num padding tokens: 2 - rank: 7 max len: 57 min len: 55 avg len: 56.0 num_loss_counted_tokens: 64 | |
total tokens: 146 num samples: 2 num padding tokens: 12 - rank: 7 max len: 73 min len: 61 avg len: 67.0 num_loss_counted_tokens: 86 | |
total tokens: 184 num samples: 2 num padding tokens: 42 - rank: 1 max len: 92 min len: 50 avg len: 71.0 num_loss_counted_tokens: 86 | |
total tokens: 164 num samples: 2 num padding tokens: 32 - rank: 6 max len: 82 min len: 50 avg len: 66.0 num_loss_counted_tokens: 94 | |
total tokens: 122 num samples: 2 num padding tokens: 6 - rank: 2 max len: 61 min len: 55 avg len: 58.0 num_loss_counted_tokens: 63 | |
total tokens: 136 num samples: 2 num padding tokens: 16 - rank: 1 max len: 68 min len: 52 avg len: 60.0 num_loss_counted_tokens: 53 | |
total tokens: 160 num samples: 2 num padding tokens: 14 - rank: 7 max len: 80 min len: 66 avg len: 73.0 num_loss_counted_tokens: 81 | |
total tokens: 196 num samples: 2 num padding tokens: 49 - rank: 7 max len: 98 min len: 49 avg len: 73.5 num_loss_counted_tokens: 96 | |
total tokens: 174 num samples: 2 num padding tokens: 23 - rank: 7 max len: 87 min len: 64 avg len: 75.5 num_loss_counted_tokens: 77 | |
total tokens: 180 num samples: 2 num padding tokens: 28 - rank: 7 max len: 90 min len: 62 avg len: 76.0 num_loss_counted_tokens: 92 | |
total tokens: 132 num samples: 2 num padding tokens: 8 - rank: 3 max len: 66 min len: 58 avg len: 62.0 num_loss_counted_tokens: 65 | |
total tokens: 228 num samples: 2 num padding tokens: 45 - rank: 3 max len: 114 min len: 69 avg len: 91.5 num_loss_counted_tokens: 117 | |
total tokens: 188 num samples: 2 num padding tokens: 41 - rank: 3 max len: 94 min len: 53 avg len: 73.5 num_loss_counted_tokens: 85 | |
total tokens: 110 num samples: 2 num padding tokens: 4 - rank: 7 max len: 55 min len: 51 avg len: 53.0 num_loss_counted_tokens: 60 | |
total tokens: 130 num samples: 2 num padding tokens: 2 - rank: 3 max len: 65 min len: 63 avg len: 64.0 num_loss_counted_tokens: 57 | |
total tokens: 142 num samples: 2 num padding tokens: 1 - rank: 6 max len: 71 min len: 70 avg len: 70.5 num_loss_counted_tokens: 66 | |
total tokens: 124 num samples: 2 num padding tokens: 18 - rank: 0 max len: 62 min len: 44 avg len: 53.0 num_loss_counted_tokens: 50 | |
total tokens: 154 num samples: 2 num padding tokens: 3 - rank: 2 max len: 77 min len: 74 avg len: 75.5 num_loss_counted_tokens: 81 | |
total tokens: 134 num samples: 2 num padding tokens: 19 - rank: 0 max len: 67 min len: 48 avg len: 57.5 num_loss_counted_tokens: 64 | |
total tokens: 142 num samples: 2 num padding tokens: 11 - rank: 4 max len: 71 min len: 60 avg len: 65.5 num_loss_counted_tokens: 75 | |
total tokens: 144 num samples: 2 num padding tokens: 28 - rank: 7 max len: 72 min len: 44 avg len: 58.0 num_loss_counted_tokens: 72 | |
total tokens: 158 num samples: 2 num padding tokens: 24 - rank: 0 max len: 79 min len: 55 avg len: 67.0 num_loss_counted_tokens: 75 | |
total tokens: 162 num samples: 2 num padding tokens: 0 - rank: 0 max len: 81 min len: 81 avg len: 81.0 num_loss_counted_tokens: 96 | |
total tokens: 120 num samples: 2 num padding tokens: 16 - rank: 0 max len: 60 min len: 44 avg len: 52.0 num_loss_counted_tokens: 55 | |
total tokens: 134 num samples: 2 num padding tokens: 13 - rank: 0 max len: 67 min len: 54 avg len: 60.5 num_loss_counted_tokens: 58 | |
total tokens: 244 num samples: 2 num padding tokens: 46 - rank: 0 max len: 122 min len: 76 avg len: 99.0 num_loss_counted_tokens: 149 | |
total tokens: 174 num samples: 2 num padding tokens: 24 - rank: 6 max len: 87 min len: 63 avg len: 75.0 num_loss_counted_tokens: 83 | |
total tokens: 214 num samples: 2 num padding tokens: 62 - rank: 0 max len: 107 min len: 45 avg len: 76.0 num_loss_counted_tokens: 99 | |
total tokens: 202 num samples: 2 num padding tokens: 40 - rank: 0 max len: 101 min len: 61 avg len: 81.0 num_loss_counted_tokens: 104 | |
total tokens: 116 num samples: 2 num padding tokens: 13 - rank: 3 max len: 58 min len: 45 avg len: 51.5 num_loss_counted_tokens: 51 | |
total tokens: 116 num samples: 2 num padding tokens: 12 - rank: 0 max len: 58 min len: 46 avg len: 52.0 num_loss_counted_tokens: 54 | |
total tokens: 166 num samples: 2 num padding tokens: 31 - rank: 0 max len: 83 min len: 52 avg len: 67.5 num_loss_counted_tokens: 81 | |
total tokens: 140 num samples: 2 num padding tokens: 4 - rank: 0 max len: 70 min len: 66 avg len: 68.0 num_loss_counted_tokens: 66 | |
Per-token loss scaled by world size: 0.00020837620832026005Per-token loss scaled by world size: 0.00101291888859123Per-token loss scaled by world size: 0.0005203241598792374 | |
Per-token loss scaled by world size: 0.0005080624832771719 | |
Per-token loss scaled by world size: 0.0008971338393166661 | |
Per-token loss scaled by world size: 9.295487870986108e-06 | |
Per-token loss scaled by world size: 3.1353247322840616e-05 | |
Epoch: 9, Step: 109, Rank: 1, loss = 0.07001801580190659 | |
Epoch: 9, Step: 109, Rank: 6, loss = 0.014404005371034145 | |
Epoch: 9, Step: 109, Rank: 4, loss = 0.03596740588545799 | |
Epoch: 9, Step: 109, Rank: 5, loss = 0.0620143748819828Epoch: 9, Step: 109, Rank: 7, loss = 0.03511982038617134 | |
Epoch: 9, Step: 109, Rank: 3, loss = 0.0021672931034117937 | |
Epoch: 9, Step: 109, Rank: 2, loss = 0.0006425505853258073 | |
Per-token loss scaled by world size: 1.730842632241547e-05 | |
Epoch: 9, Step: 109, Rank: 0, loss = 0.0011964449658989906 | |
[2024-07-27 20:07:48,316] [INFO] [logging.py:96:log_dist] [Rank 0] step=109, skipped=0, lr=[6.543553540053926e-07], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:48,396] [INFO] [timer.py:258:stop] epoch=0/micro_step=109/global_step=109, RunningAvgSamplesPerSec=31.767697068441695, CurrSamplesPerSec=32.0406552236319, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9, | 1/12 [00:00<00:10, 1.08it/s] | |
"step": 109, | |
"rank": 0, | |
"loss": 0.0011964449658989906, | |
"overall_throughput": 31.927760494448584, | |
"lr": 6.543553540053926e-07, | |
"cuda_mem_allocated": 21.998091220855713, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 553, | |
"batch_size": 16, | |
"total_loss": 0.02769123949110508, | |
"gradnorm": 0.690696120262146, | |
"weight_norm": 393.47589111328125, | |
"timestamp": "2024-07-27T20:07:48.445399" | |
} | |
Per-token loss scaled by world size: 0.0014274526620283723Per-token loss scaled by world size: 0.0002300855703651905Per-token loss scaled by world size: 0.0005493586650118232Per-token loss scaled by world size: 0.0009031961672008038Per-token loss scaled by world size: 0.0002857441722881049Per-token loss scaled by world size: 0.0005578985437750816 | |
Per-token loss scaled by world size: 0.00028736007516272366 | |
Epoch: 9, Step: 110, Rank: 2, loss = 0.03742505982518196Epoch: 9, Step: 110, Rank: 5, loss = 0.01567457988858223Epoch: 9, Step: 110, Rank: 3, loss = 0.06153023988008499 | |
Epoch: 9, Step: 110, Rank: 6, loss = 0.09724520891904831 | |
Epoch: 9, Step: 110, Rank: 7, loss = 0.019466321915388107 | |
Epoch: 9, Step: 110, Rank: 0, loss = 0.03800683841109276 | |
Epoch: 9, Step: 110, Rank: 4, loss = 0.019576406106352806 | |
Per-token loss scaled by world size: 0.0003402826841920614 | |
Epoch: 9, Step: 110, Rank: 1, loss = 0.02318175695836544 | |
[2024-07-27 20:07:48,854] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=0, lr=[5.418275829936537e-07], mom=[(0.9, 0.95)] | |
[2024-07-27 20:07:48,932] [INFO] [timer.py:258:stop] epoch=0/micro_step=110/global_step=110, RunningAvgSamplesPerSec=31.778440450016202, CurrSamplesPerSec=32.971544549678505, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
Saving model in huggingface format at samples_seen: 1760 | |
{ | |
"epoch": 9, | |
"step": 110, | |
"rank": 0, | |
"loss": 0.03800683841109276, | |
"overall_throughput": 32.86280148005085, | |
"lr": 5.418275829936537e-07, | |
"cuda_mem_allocated": 21.999285221099854, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 545, | |
"batch_size": 16, | |
"total_loss": 0.03901330381631851, | |
"gradnorm": 0.746061384677887, | |
"weight_norm": 393.47589111328125, | |
"timestamp": "2024-07-27T20:07:48.935854" | |
} | |
Model saved in /var/instructlabbigdisk/instructlab/skillscheckpoints/hf_format/samples_1760 | |
[20:08:06] INFO saving took 18.007025003433228 seconds utils.py:611 | |
Per-token loss scaled by world size: 0.000437290029367432Per-token loss scaled by world size: 0.00030764410621486604Per-token loss scaled by world size: 0.00027671968564391136 | |
Epoch 9: 17%|█▋ | 2/12 [00:19<01:52, 11.29s/it] | |
Per-token loss scaled by world size: 3.6100764191360213e-06Per-token loss scaled by world size: 8.117486140690744e-05 | |
Per-token loss scaled by world size: 1.315043573413277e-05 | |
Epoch: 9, Step: 111, Rank: 1, loss = 0.024212971329689026 | |
Epoch: 9, Step: 111, Rank: 0, loss = 0.038262877613306046Epoch: 9, Step: 111, Rank: 6, loss = 0.0003158816834911704Epoch: 9, Step: 111, Rank: 7, loss = 0.026918860152363777 | |
Epoch: 9, Step: 111, Rank: 5, loss = 0.007102800067514181 | |
Epoch: 9, Step: 111, Rank: 4, loss = 0.0011506631271913648 | |
Per-token loss scaled by world size: 4.370940587250516e-05 | |
Epoch: 9, Step: 111, Rank: 2, loss = 0.0038245730102062225 | |
Per-token loss scaled by world size: 0.0007190197939053178 | |
Epoch: 9, Step: 111, Rank: 3, loss = 0.06291422992944717 | |
[2024-07-27 20:08:07,415] [INFO] [logging.py:96:log_dist] [Rank 0] step=111, skipped=0, lr=[4.396421846564236e-07], mom=[(0.9, 0.95)] | |
[2024-07-27 20:08:07,493] [INFO] [timer.py:258:stop] epoch=0/micro_step=111/global_step=111, RunningAvgSamplesPerSec=31.77947002508808, CurrSamplesPerSec=31.891058187078368, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9,█▌ | 3/12 [00:20<00:57, 6.39s/it] | |
"step": 111, | |
"rank": 0, | |
"loss": 0.038262877613306046, | |
"overall_throughput": 31.83512815148673, | |
"lr": 4.396421846564236e-07, | |
"cuda_mem_allocated": 22.00214672088623, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 700, | |
"batch_size": 16, | |
"total_loss": 0.020587855949997902, | |
"gradnorm": 0.4546854794025421, | |
"weight_norm": 393.4759216308594, | |
"timestamp": "2024-07-27T20:08:07.536418" | |
} | |
Per-token loss scaled by world size: 1.5196498679870274e-05Per-token loss scaled by world size: 0.00038456235779449344Per-token loss scaled by world size: 2.9230683139758185e-05Per-token loss scaled by world size: 4.643391366698779e-05Per-token loss scaled by world size: 0.000584576278924942 | |
Per-token loss scaled by world size: 2.605171175673604e-05Per-token loss scaled by world size: 0.000400967663154006 | |
Epoch: 9, Step: 112, Rank: 7, loss = 0.0473506785929203 | |
Epoch: 9, Step: 112, Rank: 5, loss = 0.031149551272392273 | |
Epoch: 9, Step: 112, Rank: 6, loss = 0.002367685316130519 | |
Epoch: 9, Step: 112, Rank: 1, loss = 0.003761146916076541 | |
Epoch: 9, Step: 112, Rank: 2, loss = 0.0012309163575991988 | |
Epoch: 9, Step: 112, Rank: 3, loss = 0.03247838094830513 | |
Epoch: 9, Step: 112, Rank: 0, loss = 0.0021101885940879583 | |
Per-token loss scaled by world size: 2.4041009965003468e-05 | |
Epoch: 9, Step: 112, Rank: 4, loss = 0.0019473218126222491 | |
[2024-07-27 20:08:07,956] [INFO] [logging.py:96:log_dist] [Rank 0] step=112, skipped=0, lr=[3.4791089722651437e-07], mom=[(0.9, 0.95)] | |
[2024-07-27 20:08:08,033] [INFO] [timer.py:258:stop] epoch=0/micro_step=112/global_step=112, RunningAvgSamplesPerSec=31.783663551587292, CurrSamplesPerSec=32.24748961705518, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9,██▎ | 4/12 [00:20<00:32, 4.08s/it] | |
"step": 112, | |
"rank": 0, | |
"loss": 0.0021101885940879583, | |
"overall_throughput": 32.15704750085054, | |
"lr": 3.4791089722651437e-07, | |
"cuda_mem_allocated": 22.002624034881592, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 648, | |
"batch_size": 16, | |
"total_loss": 0.015299483202397823, | |
"gradnorm": 0.40167224407196045, | |
"weight_norm": 393.4759216308594, | |
"timestamp": "2024-07-27T20:08:08.078974" | |
} | |
Per-token loss scaled by world size: 0.0005526405875571072Per-token loss scaled by world size: 0.0009644478559494019Per-token loss scaled by world size: 0.00029839982744306326Per-token loss scaled by world size: 0.0004884039517492056Per-token loss scaled by world size: 6.763617875549244e-06Per-token loss scaled by world size: 0.00016722115105949342 | |
Per-token loss scaled by world size: 9.371204214403406e-05 | |
Epoch: 9, Step: 113, Rank: 4, loss = 0.020887987688183784Epoch: 9, Step: 113, Rank: 1, loss = 0.03418827801942825Epoch: 9, Step: 113, Rank: 6, loss = 0.00047345325583592057Epoch: 9, Step: 113, Rank: 3, loss = 0.06751134991645813 | |
Epoch: 9, Step: 113, Rank: 7, loss = 0.011705480515956879Epoch: 9, Step: 113, Rank: 5, loss = 0.03868484124541283 | |
Epoch: 9, Step: 113, Rank: 2, loss = 0.006559843197464943 | |
Per-token loss scaled by world size: 0.0006510451785288751 | |
Epoch: 9, Step: 113, Rank: 0, loss = 0.0455731637775898 | |
[2024-07-27 20:08:08,497] [INFO] [logging.py:96:log_dist] [Rank 0] step=113, skipped=0, lr=[2.667340275199426e-07], mom=[(0.9, 0.95)] | |
[2024-07-27 20:08:08,574] [INFO] [timer.py:258:stop] epoch=0/micro_step=113/global_step=113, RunningAvgSamplesPerSec=31.789040793376557, CurrSamplesPerSec=32.391855899896804, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9,███▏ | 5/12 [00:21<00:19, 2.80s/it] | |
"step": 113, | |
"rank": 0, | |
"loss": 0.0455731637775898, | |
"overall_throughput": 32.30453715301278, | |
"lr": 2.667340275199426e-07, | |
"cuda_mem_allocated": 21.99761438369751, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 560, | |
"batch_size": 16, | |
"total_loss": 0.02819805033504963, | |
"gradnorm": 0.569709300994873, | |
"weight_norm": 393.4759216308594, | |
"timestamp": "2024-07-27T20:08:08.621715" | |
} | |
Per-token loss scaled by world size: 0.00019008330127689987Per-token loss scaled by world size: 0.00028144754469394684Per-token loss scaled by world size: 0.00048485625302419066Per-token loss scaled by world size: 0.0003446684859227389Per-token loss scaled by world size: 0.0005211489042267203 | |
Per-token loss scaled by world size: 0.000750985462218523 | |
Per-token loss scaled by world size: 0.00011730282858479768 | |
Epoch: 9, Step: 114, Rank: 1, loss = 0.041273389011621475Epoch: 9, Step: 114, Rank: 2, loss = 0.023958221077919006Epoch: 9, Step: 114, Rank: 5, loss = 0.029339905828237534 | |
Epoch: 9, Step: 114, Rank: 7, loss = 0.04436279833316803Epoch: 9, Step: 114, Rank: 0, loss = 0.016180841252207756 | |
Epoch: 9, Step: 114, Rank: 6, loss = 0.06392763555049896 | |
Epoch: 9, Step: 114, Rank: 4, loss = 0.009985403157770634 | |
Per-token loss scaled by world size: 0.0006613648729398847 | |
Epoch: 9, Step: 114, Rank: 3, loss = 0.05629868432879448 | |
[2024-07-27 20:08:09,048] [INFO] [logging.py:96:log_dist] [Rank 0] step=114, skipped=0, lr=[1.9620034125190645e-07], mom=[(0.9, 0.95)] | |
[2024-07-27 20:08:09,125] [INFO] [timer.py:258:stop] epoch=0/micro_step=114/global_step=114, RunningAvgSamplesPerSec=31.789347765969946, CurrSamplesPerSec=31.82345861552571, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9,████ | 6/12 [00:21<00:12, 2.04s/it] | |
"step": 114, | |
"rank": 0, | |
"loss": 0.016180841252207756, | |
"overall_throughput": 31.73594640312759, | |
"lr": 1.9620034125190645e-07, | |
"cuda_mem_allocated": 22.01240301132202, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 681, | |
"batch_size": 16, | |
"total_loss": 0.035665858536958694, | |
"gradnorm": 0.562235951423645, | |
"weight_norm": 393.4759216308594, | |
"timestamp": "2024-07-27T20:08:09.167828" | |
} | |
Per-token loss scaled by world size: 0.00038479233626276255Per-token loss scaled by world size: 0.00017321300401818007Per-token loss scaled by world size: 0.00037739198887720704Per-token loss scaled by world size: 0.0002703580248635262Per-token loss scaled by world size: 0.0003972994163632393 | |
Per-token loss scaled by world size: 0.0005138221313245595Per-token loss scaled by world size: 0.0007780570886097848 | |
Epoch: 9, Step: 115, Rank: 2, loss = 0.028068529441952705 | |
Epoch: 9, Step: 115, Rank: 5, loss = 0.038215521723032Epoch: 9, Step: 115, Rank: 7, loss = 0.012882716953754425Epoch: 9, Step: 115, Rank: 0, loss = 0.028618929907679558Epoch: 9, Step: 115, Rank: 4, loss = 0.020107878372073174Epoch: 9, Step: 115, Rank: 6, loss = 0.029549144208431244 | |
Epoch: 9, Step: 115, Rank: 1, loss = 0.05786799639463425 | |
Per-token loss scaled by world size: 0.0008164329337887466 | |
Epoch: 9, Step: 115, Rank: 3, loss = 0.06072219833731651 | |
[2024-07-27 20:08:09,592] [INFO] [logging.py:96:log_dist] [Rank 0] step=115, skipped=0, lr=[1.3638696597277678e-07], mom=[(0.9, 0.95)] | |
[2024-07-27 20:08:09,670] [INFO] [timer.py:258:stop] epoch=0/micro_step=115/global_step=115, RunningAvgSamplesPerSec=31.790592582420803, CurrSamplesPerSec=31.930631657680326, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9,████▊ | 7/12 [00:22<00:07, 1.55s/it] | |
"step": 115, | |
"rank": 0, | |
"loss": 0.028618929907679558, | |
"overall_throughput": 31.843164422360154, | |
"lr": 1.3638696597277678e-07, | |
"cuda_mem_allocated": 22.00882577896118, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 595, | |
"batch_size": 16, | |
"total_loss": 0.03450411558151245, | |
"gradnorm": 0.7144197821617126, | |
"weight_norm": 393.4759216308594, | |
"timestamp": "2024-07-27T20:08:09.673464" | |
} | |
Per-token loss scaled by world size: 0.0001559390948386863Per-token loss scaled by world size: 0.0005046449950896204Per-token loss scaled by world size: 7.42613265174441e-05Per-token loss scaled by world size: 0.00044146235450170934 | |
Per-token loss scaled by world size: 1.3344148101168685e-05Per-token loss scaled by world size: 8.937142411014065e-06 | |
Per-token loss scaled by world size: 4.871577039011754e-05 | |
Epoch: 9, Step: 116, Rank: 1, loss = 0.037406809628009796Epoch: 9, Step: 116, Rank: 5, loss = 0.005504620727151632Epoch: 9, Step: 116, Rank: 0, loss = 0.011558985337615013 | |
Epoch: 9, Step: 116, Rank: 7, loss = 0.03272339701652527Epoch: 9, Step: 116, Rank: 6, loss = 0.0006624656962230802 | |
Epoch: 9, Step: 116, Rank: 2, loss = 0.0009891350055113435 | |
Epoch: 9, Step: 116, Rank: 4, loss = 0.0036110563669353724 | |
Per-token loss scaled by world size: 0.0008237811853177845 | |
Epoch: 9, Step: 116, Rank: 3, loss = 0.061062779277563095 | |
[2024-07-27 20:08:10,138] [INFO] [logging.py:96:log_dist] [Rank 0] step=116, skipped=0, lr=[8.735930673024806e-08], mom=[(0.9, 0.95)] | |
[2024-07-27 20:08:10,215] [INFO] [timer.py:258:stop] epoch=0/micro_step=116/global_step=116, RunningAvgSamplesPerSec=31.793855612274417, CurrSamplesPerSec=32.166943077303586, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9,█████▋ | 8/12 [00:22<00:04, 1.23s/it] | |
"step": 116, | |
"rank": 0, | |
"loss": 0.011558985337615013, | |
"overall_throughput": 32.10873615463745, | |
"lr": 8.735930673024806e-08, | |
"cuda_mem_allocated": 22.000000476837158, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 593, | |
"batch_size": 16, | |
"total_loss": 0.019189907237887383, | |
"gradnorm": 0.4252445697784424, | |
"weight_norm": 393.4759216308594, | |
"timestamp": "2024-07-27T20:08:10.257978" | |
} | |
Per-token loss scaled by world size: 0.000993338762782514Per-token loss scaled by world size: 0.0006966108339838684Per-token loss scaled by world size: 0.0003090917889494449Per-token loss scaled by world size: 3.207974077668041e-05 | |
Per-token loss scaled by world size: 0.0003707126888912171 | |
Per-token loss scaled by world size: 3.2455467589898035e-05 | |
Per-token loss scaled by world size: 9.373086504638195e-05 | |
Epoch: 9, Step: 117, Rank: 1, loss = 0.025500072166323662 | |
Epoch: 9, Step: 117, Rank: 0, loss = 0.08195044845342636Epoch: 9, Step: 117, Rank: 5, loss = 0.05747039616107941Epoch: 9, Step: 117, Rank: 3, loss = 0.0026775761507451534 | |
Epoch: 9, Step: 117, Rank: 6, loss = 0.002646578708663583 | |
Epoch: 9, Step: 117, Rank: 4, loss = 0.03058379702270031 | |
Epoch: 9, Step: 117, Rank: 2, loss = 0.007732796482741833 | |
Per-token loss scaled by world size: 0.0008588652708567679 | |
Epoch: 9, Step: 117, Rank: 7, loss = 0.07085638493299484 | |
[2024-07-27 20:08:10,687] [INFO] [logging.py:96:log_dist] [Rank 0] step=117, skipped=0, lr=[4.9170974549885844e-08], mom=[(0.9, 0.95)] | |
[2024-07-27 20:08:10,765] [INFO] [timer.py:258:stop] epoch=0/micro_step=117/global_step=117, RunningAvgSamplesPerSec=31.79231829238237, CurrSamplesPerSec=31.618032996197385, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9,██████▌ | 9/12 [00:23<00:03, 1.02s/it] | |
"step": 117, | |
"rank": 0, | |
"loss": 0.08195044845342636, | |
"overall_throughput": 31.54031480690336, | |
"lr": 4.9170974549885844e-08, | |
"cuda_mem_allocated": 21.997137546539307, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 660, | |
"batch_size": 16, | |
"total_loss": 0.034927256405353546, | |
"gradnorm": 0.6421816945075989, | |
"weight_norm": 393.4759216308594, | |
"timestamp": "2024-07-27T20:08:10.768286" | |
} | |
Per-token loss scaled by world size: 0.00030643673380836844Per-token loss scaled by world size: 0.000586669659242034Per-token loss scaled by world size: 0.001160175772383809Per-token loss scaled by world size: 0.000358547898940742Per-token loss scaled by world size: 0.00025170366279780865Per-token loss scaled by world size: 0.00010790762462420389 | |
Per-token loss scaled by world size: 4.243180592311546e-05 | |
Epoch: 9, Step: 118, Rank: 7, loss = 0.0846928283572197Epoch: 9, Step: 118, Rank: 4, loss = 0.04282688349485397Epoch: 9, Step: 118, Rank: 6, loss = 0.026173997670412064Epoch: 9, Step: 118, Rank: 1, loss = 0.007877256721258163 | |
Epoch: 9, Step: 118, Rank: 0, loss = 0.022369882091879845Epoch: 9, Step: 118, Rank: 5, loss = 0.0183743666857481Epoch: 9, Step: 118, Rank: 3, loss = 0.0030975218396633863 | |
Per-token loss scaled by world size: 0.00044214868103154004 | |
Epoch: 9, Step: 118, Rank: 2, loss = 0.032276853919029236 | |
[2024-07-27 20:08:11,222] [INFO] [logging.py:96:log_dist] [Rank 0] step=118, skipped=0, lr=[2.1863727812254653e-08], mom=[(0.9, 0.95)] | |
[2024-07-27 20:08:11,300] [INFO] [timer.py:258:stop] epoch=0/micro_step=118/global_step=118, RunningAvgSamplesPerSec=31.800842267046153, CurrSamplesPerSec=32.81255650372894, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9,███████▎ | 10/12 [00:23<00:01, 1.15it/s] | |
"step": 118, | |
"rank": 0, | |
"loss": 0.022369882091879845, | |
"overall_throughput": 32.728399914556505, | |
"lr": 2.1863727812254653e-08, | |
"cuda_mem_allocated": 21.999285221099854, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 584, | |
"batch_size": 16, | |
"total_loss": 0.029711198061704636, | |
"gradnorm": 0.6892233490943909, | |
"weight_norm": 393.4759216308594, | |
"timestamp": "2024-07-27T20:08:11.344526" | |
} | |
Per-token loss scaled by world size: 0.00011293171701254323Per-token loss scaled by world size: 0.00029302676557563245Per-token loss scaled by world size: 0.0008117702673189342Per-token loss scaled by world size: 0.001312798005528748Per-token loss scaled by world size: 0.0006738528027199209 | |
Per-token loss scaled by world size: 0.0004890891723334789 | |
Per-token loss scaled by world size: 3.5650893551064655e-05 | |
Epoch: 9, Step: 119, Rank: 4, loss = 0.058345988392829895 | |
Epoch: 9, Step: 119, Rank: 6, loss = 0.09435735642910004Epoch: 9, Step: 119, Rank: 0, loss = 0.021061299368739128Epoch: 9, Step: 119, Rank: 3, loss = 0.04843316972255707 | |
Epoch: 9, Step: 119, Rank: 5, loss = 0.0025624081026762724Epoch: 9, Step: 119, Rank: 2, loss = 0.008116967044770718 | |
Epoch: 9, Step: 119, Rank: 1, loss = 0.035153284668922424 | |
Per-token loss scaled by world size: 0.0010237974347546697 | |
Epoch: 9, Step: 119, Rank: 7, loss = 0.07358544319868088 | |
[2024-07-27 20:08:11,770] [INFO] [logging.py:96:log_dist] [Rank 0] step=119, skipped=0, lr=[5.467426590739511e-09], mom=[(0.9, 0.95)] | |
[2024-07-27 20:08:11,848] [INFO] [timer.py:258:stop] epoch=0/micro_step=119/global_step=119, RunningAvgSamplesPerSec=31.801217171186128, CurrSamplesPerSec=31.84476611898689, MemAllocated=22.0GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9,████████▏| 11/12 [00:24<00:00, 1.30it/s] | |
"step": 119, | |
"rank": 0, | |
"loss": 0.021061299368739128, | |
"overall_throughput": 31.765043565848615, | |
"lr": 5.467426590739511e-09, | |
"cuda_mem_allocated": 22.003100872039795, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 575, | |
"batch_size": 16, | |
"total_loss": 0.04270198941230774, | |
"gradnorm": 0.709683358669281, | |
"weight_norm": 393.4759216308594, | |
"timestamp": "2024-07-27T20:08:11.891436" | |
} | |
Per-token loss scaled by world size: 0.0007179552922025323Per-token loss scaled by world size: 0.00047626314335502684Per-token loss scaled by world size: 0.000766461540479213Per-token loss scaled by world size: 0.000950768415350467Per-token loss scaled by world size: 0.00014302438648883253Per-token loss scaled by world size: 1.124614391301293e-05 | |
Per-token loss scaled by world size: 0.00010744491737568751 | |
Epoch: 9, Step: 120, Rank: 6, loss = 0.03857731446623802Epoch: 9, Step: 120, Rank: 7, loss = 0.058154378086328506Epoch: 9, Step: 120, Rank: 0, loss = 0.07701224088668823 | |
Epoch: 9, Step: 120, Rank: 4, loss = 0.0009109376696869731 | |
Epoch: 9, Step: 120, Rank: 3, loss = 0.06208338588476181Epoch: 9, Step: 120, Rank: 5, loss = 0.008703038096427917Epoch: 9, Step: 120, Rank: 1, loss = 0.011584974825382233 | |
Per-token loss scaled by world size: 0.00025203556288033724 | |
Epoch: 9, Step: 120, Rank: 2, loss = 0.02041487954556942 | |
[2024-07-27 20:08:12,306] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=0, lr=[0.0], mom=[(0.9, 0.95)] | |
[2024-07-27 20:08:12,384] [INFO] [timer.py:258:stop] epoch=0/micro_step=120/global_step=120, RunningAvgSamplesPerSec=31.807319807184527, CurrSamplesPerSec=32.53786766934063, MemAllocated=22.01GB, MaxMemAllocated=28.3GB | |
{ | |
"epoch": 9,█████████| 12/12 [00:24<00:00, 1.43it/s] | |
"step": 120, | |
"rank": 0, | |
"loss": 0.07701224088668823, | |
"overall_throughput": 32.45850310128811, | |
"lr": 0.0, | |
"cuda_mem_allocated": 22.007394790649414, | |
"cuda_malloc_retries": 0, | |
"num_loss_counted_tokens": 648, | |
"batch_size": 16, | |
"total_loss": 0.034680142998695374, | |
"gradnorm": 0.5826724767684937, | |
"weight_norm": 393.4759216308594, | |
"timestamp": "2024-07-27T20:08:12.387320" | |
} | |
Epoch 9: 100%|██████████| 12/12 [00:24<00:00, 2.08s/it] | |
tyler-rhel-newimage:260:1034 [0] NCCL INFO [Service thread] Connection closed by localRank 0 | |
tyler-rhel-newimage:261:1036 [1] NCCL INFO [Service thread] Connection closed by localRank 1 | |
tyler-rhel-newimage:266:1030 [6] NCCL INFO [Service thread] Connection closed by localRank 6 | |
tyler-rhel-newimage:263:1038 [3] NCCL INFO [Service thread] Connection closed by localRank 3 | |
tyler-rhel-newimage:262:1040 [2] NCCL INFO [Service thread] Connection closed by localRank 2 | |
tyler-rhel-newimage:267:1044 [7] NCCL INFO [Service thread] Connection closed by localRank 7 | |
tyler-rhel-newimage:265:1042 [5] NCCL INFO [Service thread] Connection closed by localRank 5 | |
tyler-rhel-newimage:264:1032 [4] NCCL INFO [Service thread] Connection closed by localRank 4 | |
tyler-rhel-newimage:260:43471 [0] NCCL INFO comm 0x558210938950 rank 0 nranks 8 cudaDev 0 busId 8010 - Abort COMPLETE | |
tyler-rhel-newimage:267:43476 [0] NCCL INFO comm 0x564fb40d9fa0 rank 7 nranks 8 cudaDev 7 busId e080 - Abort COMPLETE | |
tyler-rhel-newimage:266:43475 [0] NCCL INFO comm 0x55f359e7d980 rank 6 nranks 8 cudaDev 6 busId e070 - Abort COMPLETE | |
tyler-rhel-newimage:262:43470 [0] NCCL INFO comm 0x55f25f665d50 rank 2 nranks 8 cudaDev 2 busId a030 - Abort COMPLETE | |
tyler-rhel-newimage:261:43477 [0] NCCL INFO comm 0x55fca60525d0 rank 1 nranks 8 cudaDev 1 busId 8020 - Abort COMPLETE | |
tyler-rhel-newimage:263:43473 [0] NCCL INFO comm 0x55fffff3ce80 rank 3 nranks 8 cudaDev 3 busId a040 - Abort COMPLETE | |
tyler-rhel-newimage:265:43474 [0] NCCL INFO comm 0x56464a4e7a70 rank 5 nranks 8 cudaDev 5 busId c060 - Abort COMPLETE | |
tyler-rhel-newimage:264:43472 [0] NCCL INFO comm 0x55b22a5ae220 rank 4 nranks 8 cudaDev 4 busId c050 - Abort COMPLETE | |
Terminating process 🤖 | |
[root@tyler-rhel-newimage instructlab]# |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment