Last active
December 21, 2023 22:33
-
-
Save ehartford/1ba7e45f269e4d58792477079a772b86 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The following values were not passed to `accelerate launch` and had defaults used instead: | |
`--num_processes` was set to a value of `4` | |
More than one GPU was found, enabling multi-GPU training. | |
If this was unintended please pass in `--num_processes=1`. | |
`--num_machines` was set to a value of `1` | |
`--mixed_precision` was set to a value of `'no'` | |
`--dynamo_backend` was set to a value of `'no'` | |
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. | |
/workspace/axolotl/transformers/src/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations | |
warnings.warn( | |
/workspace/axolotl/transformers/src/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations | |
warnings.warn( | |
/workspace/axolotl/transformers/src/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations | |
warnings.warn( | |
/workspace/axolotl/transformers/src/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations | |
warnings.warn( | |
Saving the dataset (0/16 shards): 0%| | 0/1012643 [00:00<?, ? examples/s] | |
Saving the dataset (0/16 shards): 0%| | 1000/1012643 [00:00<01:54, 8829.67 examples/s] | |
Saving the dataset (0/16 shards): 1%| | 7000/1012643 [00:00<00:27, 35938.21 examples/s] | |
Saving the dataset (0/16 shards): 1%|▏ | 15000/1012643 [00:00<00:19, 50130.08 examples/s] | |
Saving the dataset (0/16 shards): 2%|▏ | 23000/1012643 [00:00<00:16, 58545.39 examples/s] | |
Saving the dataset (0/16 shards): 3%|▎ | 31000/1012643 [00:00<00:15, 65082.04 examples/s] | |
Saving the dataset (0/16 shards): 4%|▍ | 41000/1012643 [00:00<00:13, 71294.64 examples/s] | |
Saving the dataset (0/16 shards): 5%|▌ | 51000/1012643 [00:00<00:12, 76690.74 examples/s] | |
Saving the dataset (0/16 shards): 6%|▌ | 61000/1012643 [00:00<00:11, 81487.64 examples/s] | |
Saving the dataset (1/16 shards): 6%|▋ | 63291/1012643 [00:01<00:11, 81487.64 examples/s] | |
Saving the dataset (1/16 shards): 7%|▋ | 70291/1012643 [00:01<00:34, 27344.42 examples/s] | |
Saving the dataset (1/16 shards): 8%|▊ | 80291/1012643 [00:01<00:26, 35691.33 examples/s] | |
Saving the dataset (1/16 shards): 9%|▉ | 91291/1012643 [00:01<00:20, 45750.36 examples/s] | |
Saving the dataset (1/16 shards): 10%|█ | 102291/1012643 [00:02<00:16, 55786.10 examples/s] | |
Saving the dataset (1/16 shards): 11%|█ | 112291/1012643 [00:02<00:14, 64129.69 examples/s] | |
Saving the dataset (1/16 shards): 12%|█▏ | 124291/1012643 [00:02<00:12, 73684.29 examples/s] | |
Saving the dataset (2/16 shards): 13%|█▎ | 126582/1012643 [00:02<00:12, 73684.29 examples/s] | |
Saving the dataset (2/16 shards): 13%|█▎ | 134582/1012643 [00:02<00:14, 59359.72 examples/s] | |
Saving the dataset (2/16 shards): 14%|█▍ | 146582/1012643 [00:02<00:12, 69546.21 examples/s] | |
Saving the dataset (2/16 shards): 16%|█▌ | 158582/1012643 [00:02<00:10, 78199.86 examples/s] | |
Saving the dataset (2/16 shards): 17%|█▋ | 170582/1012643 [00:02<00:09, 86547.37 examples/s] | |
Saving the dataset (2/16 shards): 18%|█▊ | 182582/1012643 [00:02<00:08, 93435.48 examples/s] | |
Saving the dataset (3/16 shards): 19%|█▉ | 189873/1012643 [00:04<00:08, 93435.48 examples/s] | |
Saving the dataset (3/16 shards): 20%|█▉ | 197873/1012643 [00:04<00:29, 27310.85 examples/s] | |
Saving the dataset (3/16 shards): 21%|██ | 209873/1012643 [00:04<00:23, 34872.66 examples/s] | |
Saving the dataset (3/16 shards): 22%|██▏ | 221873/1012643 [00:04<00:18, 43556.12 examples/s] | |
Saving the dataset (3/16 shards): 23%|██▎ | 233873/1012643 [00:04<00:14, 53152.28 examples/s] | |
Saving the dataset (3/16 shards): 24%|██▍ | 245873/1012643 [00:04<00:12, 63128.12 examples/s] | |
Saving the dataset (4/16 shards): 25%|██▌ | 253163/1012643 [00:05<00:12, 63128.12 examples/s] | |
Saving the dataset (4/16 shards): 26%|██▌ | 261163/1012643 [00:06<00:32, 23375.01 examples/s] | |
Saving the dataset (4/16 shards): 27%|██▋ | 273163/1012643 [00:06<00:24, 30079.37 examples/s] | |
Saving the dataset (4/16 shards): 28%|██▊ | 285163/1012643 [00:06<00:19, 37998.87 examples/s] | |
Saving the dataset (4/16 shards): 29%|██▉ | 297163/1012643 [00:06<00:15, 47268.07 examples/s] | |
Saving the dataset (4/16 shards): 31%|███ | 309163/1012643 [00:06<00:12, 57287.12 examples/s] | |
Saving the dataset (5/16 shards): 31%|███▏ | 316453/1012643 [00:11<00:12, 57287.12 examples/s] | |
Saving the dataset (5/16 shards): 32%|███▏ | 324453/1012643 [00:11<01:27, 7824.12 examples/s] | |
Saving the dataset (5/16 shards): 33%|███▎ | 336453/1012643 [00:11<01:03, 10601.66 examples/s] | |
Saving the dataset (5/16 shards): 34%|███▍ | 348453/1012643 [00:11<00:46, 14308.66 examples/s] | |
Saving the dataset (5/16 shards): 36%|███▌ | 360453/1012643 [00:11<00:33, 19211.28 examples/s] | |
Saving the dataset (5/16 shards): 37%|███▋ | 372453/1012643 [00:12<00:25, 25467.34 examples/s] | |
Saving the dataset (6/16 shards): 38%|███▊ | 379743/1012643 [00:14<00:24, 25467.34 examples/s] | |
Saving the dataset (6/16 shards): 38%|███▊ | 387743/1012643 [00:14<00:54, 11457.54 examples/s] | |
Saving the dataset (6/16 shards): 39%|███▉ | 399743/1012643 [00:14<00:39, 15335.91 examples/s] | |
Saving the dataset (6/16 shards): 41%|████ | 411743/1012643 [00:14<00:29, 20380.43 examples/s] | |
Saving the dataset (6/16 shards): 42%|████▏ | 423743/1012643 [00:15<00:21, 26807.35 examples/s] | |
Saving the dataset (6/16 shards): 43%|████▎ | 435743/1012643 [00:15<00:16, 34621.82 examples/s] | |
Saving the dataset (7/16 shards): 44%|████▍ | 443033/1012643 [00:20<00:16, 34621.82 examples/s] | |
Saving the dataset (7/16 shards): 45%|████▍ | 451033/1012643 [00:20<01:16, 7389.54 examples/s] | |
Saving the dataset (7/16 shards): 46%|████▌ | 463033/1012643 [00:20<00:54, 10023.28 examples/s] | |
Saving the dataset (7/16 shards): 47%|████▋ | 475033/1012643 [00:20<00:39, 13575.80 examples/s] | |
Saving the dataset (7/16 shards): 48%|████▊ | 487033/1012643 [00:20<00:28, 18270.05 examples/s] | |
Saving the dataset (7/16 shards): 49%|████▉ | 501033/1012643 [00:20<00:20, 25244.21 examples/s] | |
Saving the dataset (8/16 shards): 50%|█████ | 506323/1012643 [00:24<00:20, 25244.21 examples/s] | |
Saving the dataset (8/16 shards): 51%|█████ | 514323/1012643 [00:24<01:01, 8159.22 examples/s] | |
Saving the dataset (8/16 shards): 52%|█████▏ | 526323/1012643 [00:24<00:43, 11067.07 examples/s] | |
Saving the dataset (8/16 shards): 53%|█████▎ | 539323/1012643 [00:25<00:30, 15289.06 examples/s] | |
Saving the dataset (8/16 shards): 54%|█████▍ | 551323/1012643 [00:25<00:22, 20364.94 examples/s] | |
Saving the dataset (8/16 shards): 56%|█████▌ | 563323/1012643 [00:25<00:16, 26740.87 examples/s] | |
Saving the dataset (9/16 shards): 56%|█████▋ | 569613/1012643 [00:26<00:16, 26740.87 examples/s] | |
Saving the dataset (9/16 shards): 57%|█████▋ | 578613/1012643 [00:27<00:28, 15275.05 examples/s] | |
Saving the dataset (9/16 shards): 58%|█████▊ | 590613/1012643 [00:27<00:20, 20144.79 examples/s] | |
Saving the dataset (9/16 shards): 60%|█████▉ | 602613/1012643 [00:27<00:15, 26342.58 examples/s] | |
Saving the dataset (9/16 shards): 61%|██████ | 614613/1012643 [00:27<00:11, 33941.78 examples/s] | |
Saving the dataset (9/16 shards): 62%|██████▏ | 626613/1012643 [00:27<00:08, 42913.77 examples/s] | |
Saving the dataset (10/16 shards): 63%|██████▎ | 632903/1012643 [00:30<00:08, 42913.77 examples/s] | |
Saving the dataset (10/16 shards): 63%|██████▎ | 640903/1012643 [00:30<00:29, 12732.57 examples/s] | |
Saving the dataset (10/16 shards): 64%|██████▍ | 652903/1012643 [00:30<00:21, 17067.97 examples/s] | |
Saving the dataset (10/16 shards): 66%|██████▌ | 664903/1012643 [00:30<00:15, 22658.89 examples/s] | |
Saving the dataset (10/16 shards): 67%|██████▋ | 676903/1012643 [00:30<00:11, 29686.26 examples/s] | |
Saving the dataset (10/16 shards): 68%|██████▊ | 689903/1012643 [00:30<00:08, 38746.94 examples/s] | |
Saving the dataset (11/16 shards): 69%|██████▉ | 696193/1012643 [00:35<00:08, 38746.94 examples/s] | |
Saving the dataset (11/16 shards): 70%|██████▉ | 704193/1012643 [00:35<00:41, 7427.92 examples/s] | |
Saving the dataset (11/16 shards): 71%|███████ | 716193/1012643 [00:35<00:29, 10105.39 examples/s] | |
Saving the dataset (11/16 shards): 72%|███████▏ | 728193/1012643 [00:35<00:20, 13703.99 examples/s] | |
Saving the dataset (11/16 shards): 73%|███████▎ | 740193/1012643 [00:35<00:14, 18458.07 examples/s] | |
Saving the dataset (11/16 shards): 74%|███████▍ | 753193/1012643 [00:36<00:10, 25037.65 examples/s] | |
Saving the dataset (12/16 shards): 75%|███████▌ | 759483/1012643 [00:40<00:10, 25037.65 examples/s] | |
Saving the dataset (12/16 shards): 76%|███████▌ | 766483/1012643 [00:41<00:37, 6638.34 examples/s] | |
Saving the dataset (12/16 shards): 77%|███████▋ | 778483/1012643 [00:41<00:25, 9103.36 examples/s] | |
Saving the dataset (12/16 shards): 78%|███████▊ | 790483/1012643 [00:41<00:17, 12444.99 examples/s] | |
Saving the dataset (12/16 shards): 79%|███████▉ | 804483/1012643 [00:41<00:11, 17595.11 examples/s] | |
Saving the dataset (12/16 shards): 81%|████████ | 816483/1012643 [00:41<00:08, 23248.89 examples/s] | |
Saving the dataset (13/16 shards): 81%|████████▏ | 822773/1012643 [00:43<00:08, 23248.89 examples/s] | |
Saving the dataset (13/16 shards): 82%|████████▏ | 830773/1012643 [00:43<00:13, 13584.68 examples/s] | |
Saving the dataset (13/16 shards): 83%|████████▎ | 842773/1012643 [00:43<00:09, 18097.70 examples/s] | |
Saving the dataset (13/16 shards): 84%|████████▍ | 854773/1012643 [00:43<00:06, 23766.88 examples/s] | |
Saving the dataset (13/16 shards): 86%|████████▌ | 866773/1012643 [00:44<00:04, 30946.14 examples/s] | |
Saving the dataset (13/16 shards): 87%|████████▋ | 878773/1012643 [00:44<00:03, 39334.29 examples/s] | |
Saving the dataset (14/16 shards): 88%|████████▊ | 886063/1012643 [00:45<00:03, 39334.29 examples/s] | |
Saving the dataset (14/16 shards): 88%|████████▊ | 894063/1012643 [00:45<00:05, 22053.38 examples/s] | |
Saving the dataset (14/16 shards): 89%|████████▉ | 906063/1012643 [00:45<00:03, 28430.92 examples/s] | |
Saving the dataset (14/16 shards): 91%|█████████ | 918063/1012643 [00:45<00:02, 36094.73 examples/s] | |
Saving the dataset (14/16 shards): 92%|█████████▏| 930063/1012643 [00:45<00:01, 44900.16 examples/s] | |
Saving the dataset (14/16 shards): 93%|█████████▎| 942063/1012643 [00:45<00:01, 54728.06 examples/s] | |
Saving the dataset (15/16 shards): 94%|█████████▍| 949353/1012643 [00:50<00:01, 54728.06 examples/s] | |
Saving the dataset (15/16 shards): 95%|█████████▍| 957353/1012643 [00:50<00:06, 8016.98 examples/s] | |
Saving the dataset (15/16 shards): 96%|█████████▌| 969353/1012643 [00:50<00:03, 10848.54 examples/s] | |
Saving the dataset (15/16 shards): 97%|█████████▋| 981353/1012643 [00:50<00:02, 14659.85 examples/s] | |
Saving the dataset (15/16 shards): 98%|█████████▊| 993353/1012643 [00:51<00:00, 19611.31 examples/s] | |
Saving the dataset (15/16 shards): 99%|█████████▉| 1005353/1012643 [00:51<00:00, 25953.90 examples/s] | |
Saving the dataset (16/16 shards): 100%|██████████| 1012643/1012643 [00:53<00:00, 25953.90 examples/s] | |
Saving the dataset (16/16 shards): 100%|██████████| 1012643/1012643 [00:53<00:00, 18839.42 examples/s] | |
Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] | |
Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] | |
Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] | |
Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] | |
Loading checkpoint shards: 5%|▌ | 1/19 [00:02<00:39, 2.20s/it] | |
Loading checkpoint shards: 5%|▌ | 1/19 [00:02<00:41, 2.29s/it] | |
Loading checkpoint shards: 5%|▌ | 1/19 [00:02<00:42, 2.35s/it] | |
Loading checkpoint shards: 5%|▌ | 1/19 [00:02<00:42, 2.38s/it] | |
Loading checkpoint shards: 11%|█ | 2/19 [00:04<00:37, 2.18s/it] | |
Loading checkpoint shards: 11%|█ | 2/19 [00:04<00:37, 2.23s/it] | |
Loading checkpoint shards: 11%|█ | 2/19 [00:04<00:39, 2.33s/it] | |
Loading checkpoint shards: 11%|█ | 2/19 [00:04<00:39, 2.34s/it] | |
Loading checkpoint shards: 16%|█▌ | 3/19 [00:06<00:35, 2.20s/it] | |
Loading checkpoint shards: 16%|█▌ | 3/19 [00:06<00:35, 2.21s/it] | |
Loading checkpoint shards: 16%|█▌ | 3/19 [00:07<00:37, 2.34s/it] | |
Loading checkpoint shards: 16%|█▌ | 3/19 [00:07<00:37, 2.34s/it] | |
Loading checkpoint shards: 21%|██ | 4/19 [00:08<00:34, 2.27s/it] | |
Loading checkpoint shards: 21%|██ | 4/19 [00:09<00:34, 2.29s/it] | |
Loading checkpoint shards: 21%|██ | 4/19 [00:09<00:34, 2.29s/it] | |
Loading checkpoint shards: 21%|██ | 4/19 [00:09<00:34, 2.33s/it] | |
Loading checkpoint shards: 26%|██▋ | 5/19 [00:11<00:31, 2.24s/it] | |
Loading checkpoint shards: 26%|██▋ | 5/19 [00:11<00:31, 2.25s/it] | |
Loading checkpoint shards: 26%|██▋ | 5/19 [00:11<00:31, 2.28s/it] | |
Loading checkpoint shards: 26%|██▋ | 5/19 [00:11<00:32, 2.34s/it] | |
Loading checkpoint shards: 32%|███▏ | 6/19 [00:13<00:28, 2.22s/it] | |
Loading checkpoint shards: 32%|███▏ | 6/19 [00:13<00:29, 2.23s/it] | |
Loading checkpoint shards: 32%|███▏ | 6/19 [00:13<00:29, 2.29s/it] | |
Loading checkpoint shards: 32%|███▏ | 6/19 [00:14<00:30, 2.34s/it] | |
Loading checkpoint shards: 37%|███▋ | 7/19 [00:15<00:26, 2.20s/it] | |
Loading checkpoint shards: 37%|███▋ | 7/19 [00:15<00:26, 2.20s/it] | |
Loading checkpoint shards: 37%|███▋ | 7/19 [00:16<00:27, 2.28s/it] | |
Loading checkpoint shards: 37%|███▋ | 7/19 [00:16<00:27, 2.33s/it] | |
Loading checkpoint shards: 42%|████▏ | 8/19 [00:17<00:24, 2.19s/it] | |
Loading checkpoint shards: 42%|████▏ | 8/19 [00:17<00:24, 2.21s/it] | |
Loading checkpoint shards: 42%|████▏ | 8/19 [00:18<00:25, 2.28s/it] | |
Loading checkpoint shards: 42%|████▏ | 8/19 [00:18<00:25, 2.32s/it] | |
Loading checkpoint shards: 47%|████▋ | 9/19 [00:19<00:22, 2.21s/it] | |
Loading checkpoint shards: 47%|████▋ | 9/19 [00:20<00:22, 2.20s/it] | |
Loading checkpoint shards: 47%|████▋ | 9/19 [00:20<00:22, 2.29s/it] | |
Loading checkpoint shards: 47%|████▋ | 9/19 [00:21<00:23, 2.34s/it] | |
Loading checkpoint shards: 53%|█████▎ | 10/19 [00:22<00:19, 2.19s/it] | |
Loading checkpoint shards: 53%|█████▎ | 10/19 [00:22<00:19, 2.19s/it] | |
Loading checkpoint shards: 53%|█████▎ | 10/19 [00:22<00:20, 2.27s/it] | |
Loading checkpoint shards: 53%|█████▎ | 10/19 [00:23<00:20, 2.33s/it] | |
Loading checkpoint shards: 58%|█████▊ | 11/19 [00:24<00:17, 2.19s/it] | |
Loading checkpoint shards: 58%|█████▊ | 11/19 [00:24<00:17, 2.19s/it] | |
Loading checkpoint shards: 58%|█████▊ | 11/19 [00:25<00:18, 2.30s/it] | |
Loading checkpoint shards: 58%|█████▊ | 11/19 [00:25<00:18, 2.32s/it] | |
Loading checkpoint shards: 63%|██████▎ | 12/19 [00:26<00:15, 2.19s/it] | |
Loading checkpoint shards: 63%|██████▎ | 12/19 [00:26<00:15, 2.19s/it] | |
Loading checkpoint shards: 63%|██████▎ | 12/19 [00:27<00:16, 2.30s/it] | |
Loading checkpoint shards: 63%|██████▎ | 12/19 [00:27<00:16, 2.33s/it] | |
Loading checkpoint shards: 68%|██████▊ | 13/19 [00:28<00:13, 2.18s/it] | |
Loading checkpoint shards: 68%|██████▊ | 13/19 [00:28<00:13, 2.19s/it] | |
Loading checkpoint shards: 68%|██████▊ | 13/19 [00:29<00:13, 2.31s/it] | |
Loading checkpoint shards: 68%|██████▊ | 13/19 [00:30<00:13, 2.33s/it] | |
Loading checkpoint shards: 74%|███████▎ | 14/19 [00:30<00:10, 2.17s/it] | |
Loading checkpoint shards: 74%|███████▎ | 14/19 [00:30<00:10, 2.19s/it] | |
Loading checkpoint shards: 74%|███████▎ | 14/19 [00:32<00:11, 2.30s/it] | |
Loading checkpoint shards: 74%|███████▎ | 14/19 [00:32<00:11, 2.31s/it] | |
Loading checkpoint shards: 79%|███████▉ | 15/19 [00:32<00:08, 2.18s/it] | |
Loading checkpoint shards: 79%|███████▉ | 15/19 [00:33<00:08, 2.20s/it] | |
Loading checkpoint shards: 79%|███████▉ | 15/19 [00:34<00:09, 2.30s/it] | |
Loading checkpoint shards: 79%|███████▉ | 15/19 [00:34<00:09, 2.33s/it] | |
Loading checkpoint shards: 84%|████████▍ | 16/19 [00:35<00:06, 2.18s/it] | |
Loading checkpoint shards: 84%|████████▍ | 16/19 [00:35<00:06, 2.21s/it] | |
Loading checkpoint shards: 84%|████████▍ | 16/19 [00:36<00:06, 2.32s/it] | |
Loading checkpoint shards: 84%|████████▍ | 16/19 [00:37<00:06, 2.33s/it] | |
Loading checkpoint shards: 89%|████████▉ | 17/19 [00:37<00:04, 2.19s/it] | |
Loading checkpoint shards: 89%|████████▉ | 17/19 [00:37<00:04, 2.20s/it] | |
Loading checkpoint shards: 89%|████████▉ | 17/19 [00:39<00:04, 2.31s/it] | |
Loading checkpoint shards: 95%|█████████▍| 18/19 [00:39<00:02, 2.20s/it] | |
Loading checkpoint shards: 89%|████████▉ | 17/19 [00:39<00:04, 2.32s/it] | |
Loading checkpoint shards: 95%|█████████▍| 18/19 [00:39<00:02, 2.21s/it] | |
Loading checkpoint shards: 100%|██████████| 19/19 [00:41<00:00, 2.08s/it] | |
Loading checkpoint shards: 100%|██████████| 19/19 [00:41<00:00, 2.18s/it] | |
Loading checkpoint shards: 95%|█████████▍| 18/19 [00:41<00:02, 2.32s/it] | |
Loading checkpoint shards: 100%|██████████| 19/19 [00:41<00:00, 2.10s/it] | |
Loading checkpoint shards: 100%|██████████| 19/19 [00:41<00:00, 2.19s/it] | |
Loading checkpoint shards: 95%|█████████▍| 18/19 [00:41<00:02, 2.32s/it] | |
Loading checkpoint shards: 100%|██████████| 19/19 [00:43<00:00, 2.19s/it] | |
Loading checkpoint shards: 100%|██████████| 19/19 [00:43<00:00, 2.28s/it] | |
Loading checkpoint shards: 100%|██████████| 19/19 [00:43<00:00, 2.19s/it] | |
Loading checkpoint shards: 100%|██████████| 19/19 [00:43<00:00, 2.31s/it] | |
Using /workspace/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... | |
Using /workspace/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... | |
Using /workspace/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... | |
Using /workspace/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... | |
Detected CUDA files, patching ldflags | |
Emitting ninja build file /workspace/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja... | |
Building extension module cpu_adam... | |
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) | |
Loading extension module cpu_adam... | |
Loading extension module cpu_adam... | |
Loading extension module cpu_adam... | |
Loading extension module cpu_adam... | |
Traceback (most recent call last): | |
File "/home/azureuser/miniconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main | |
return _run_code(code, main_globals, None, | |
File "/home/azureuser/miniconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code | |
exec(code, run_globals) | |
File "/workspace/axolotl/src/axolotl/cli/train.py", line 38, in <module> | |
fire.Fire(do_cli) | |
File "/home/azureuser/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 141, in Fire | |
component_trace = _Fire(component, args, parsed_flag_args, context, name) | |
File "/home/azureuser/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire | |
component, remaining_args = _CallAndUpdateTrace( | |
File "/home/azureuser/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace | |
component = fn(*varargs, **kwargs) | |
File "/workspace/axolotl/src/axolotl/cli/train.py", line 34, in do_cli | |
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta) | |
File "/workspace/axolotl/src/axolotl/train.py", line 129, in train | |
trainer.train(resume_from_checkpoint=resume_from_checkpoint) | |
File "/workspace/axolotl/transformers/src/transformers/trainer.py", line 1543, in train | |
return inner_training_loop( | |
File "/workspace/axolotl/transformers/src/transformers/trainer.py", line 1699, in _inner_training_loop | |
deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) | |
File "/workspace/axolotl/transformers/src/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint | |
load_path, _ = deepspeed_engine.load_checkpoint( | |
File "/home/azureuser/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2720, in load_checkpoint | |
load_path, client_states = self._load_checkpoint(load_dir, | |
File "/home/azureuser/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2790, in _load_checkpoint | |
self.load_module_state_dict(checkpoint=checkpoint, | |
File "/home/azureuser/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2583, in load_module_state_dict | |
self.module.load_state_dict( | |
File "/home/azureuser/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict | |
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( | |
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: | |
Missing key(s) in state_dict: "base_model.model.model.embed_tokens.original_module.weight", "base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.gate.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.6.w1.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.6.w2.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.6.w3.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.7.w1.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.7.w2.base_layer.weight", "base_model.model.model.layers.0.block_sparse_moe.experts.7.w3.base_layer.weight", "base_model.model.model.layers.0.input_layernorm.weight", "base_model.model.model.layers.0.post_attention_layernorm.weight", "base_model.model.model.layers.1.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.gate.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.6.w1.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.6.w2.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.6.w3.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.7.w1.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.7.w2.base_layer.weight", "base_model.model.model.layers.1.block_sparse_moe.experts.7.w3.base_layer.weight", "base_model.model.model.layers.1.input_layernorm.weight", "base_model.model.model.layers.1.post_attention_layernorm.weight", "base_model.model.model.layers.2.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.gate.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.6.w1.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.6.w2.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.6.w3.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.7.w1.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.7.w2.base_layer.weight", "base_model.model.model.layers.2.block_sparse_moe.experts.7.w3.base_layer.weight", "base_model.model.model.layers.2.input_layernorm.weight", "base_model.model.model.layers.2.post_attention_layernorm.weight", "base_model.model.model.layers.3.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.gate.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.6.w1.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.6.w2.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.6.w3.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.7.w1.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.7.w2.base_layer.weight", "base_model.model.model.layers.3.block_sparse_moe.experts.7.w3.base_layer.weight", "base_model.model.model.layers.3.input_layernorm.weight", "base_model.model.model.layers.3.post_attention_layernorm.weight", "base_model.model.model.layers.4.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.gate.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.6.w1.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.6.w2.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.6.w3.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.7.w1.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.7.w2.base_layer.weight", "base_model.model.model.layers.4.block_sparse_moe.experts.7.w3.base_layer.weight", "base_model.model.model.layers.4.input_layernorm.weight", "base_model.model.model.layers.4.post_attention_layernorm.weight", "base_model.model.model.layers.5.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.gate.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.6.w1.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.6.w2.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.6.w3.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.7.w1.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.7.w2.base_layer.weight", "base_model.model.model.layers.5.block_sparse_moe.experts.7.w3.base_layer.weight", "base_model.model.model.layers.5.input_layernorm.weight", "base_model.model.model.layers.5.post_attention_layernorm.weight", "base_model.model.model.layers.6.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.gate.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.6.w1.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.6.w2.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.6.w3.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.7.w1.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.7.w2.base_layer.weight", "base_model.model.model.layers.6.block_sparse_moe.experts.7.w3.base_layer.weight", "base_model.model.model.layers.6.input_layernorm.weight", "base_model.model.model.layers.6.post_attention_layernorm.weight", "base_model.model.model.layers.7.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.gate.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.6.w1.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.6.w2.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.6.w3.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.7.w1.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.7.w2.base_layer.weight", "base_model.model.model.layers.7.block_sparse_moe.experts.7.w3.base_layer.weight", "base_model.model.model.layers.7.input_layernorm.weight", "base_model.model.model.layers.7.post_attention_layernorm.weight", "base_model.model.model.layers.8.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.gate.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.6.w1.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.6.w2.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.6.w3.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.7.w1.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.7.w2.base_layer.weight", "base_model.model.model.layers.8.block_sparse_moe.experts.7.w3.base_layer.weight", "base_model.model.model.layers.8.input_layernorm.weight", "base_model.model.model.layers.8.post_attention_layernorm.weight", "base_model.model.model.layers.9.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.gate.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.6.w1.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.6.w2.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.6.w3.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.7.w1.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.7.w2.base_layer.weight", "base_model.model.model.layers.9.block_sparse_moe.experts.7.w3.base_layer.weight", "base_model.model.model.layers.9.input_layernorm.weight", "base_model.model.model.layers.9.post_attention_layernorm.weight", "base_model.model.model.layers.10.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.gate.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.0.w1.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.0.w2.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.0.w3.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.1.w1.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.1.w2.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.1.w3.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.2.w1.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.2.w2.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.2.w3.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.3.w1.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.3.w2.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.3.w3.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.4.w1.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.4.w2.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.4.w3.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.5.w1.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.5.w2.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe.experts.5.w3.base_layer.weight", "base_model.model.model.layers.10.block_sparse_moe... (303 KB left) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[33m[2023-12-21 21:24:13,589] [WARNING] [axolotl.validate_config:250] [PID:8259] [RANK:0] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.[39m | |
[2023-12-21 21:24:13,591] [INFO] [axolotl.normalize_config:150] [PID:8259] [RANK:0] GPU memory usage baseline: 0.000GB (+0.858GB misc)[39m | |
[33m[2023-12-21 21:24:13,596] [WARNING] [axolotl.validate_config:250] [PID:8262] [RANK:3] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.[39m | |
[2023-12-21 21:24:13,598] [INFO] [axolotl.normalize_config:150] [PID:8262] [RANK:3] GPU memory usage baseline: 0.000GB (+0.858GB misc)[39m | |
[2023-12-21 21:24:13,611] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2023-12-21 21:24:13,614] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[33m[2023-12-21 21:24:13,739] [WARNING] [axolotl.validate_config:250] [PID:8261] [RANK:2] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.[39m | |
[2023-12-21 21:24:13,741] [INFO] [axolotl.normalize_config:150] [PID:8261] [RANK:2] GPU memory usage baseline: 0.000GB (+0.858GB misc)[39m | |
[33m[2023-12-21 21:24:13,757] [WARNING] [axolotl.validate_config:250] [PID:8260] [RANK:1] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.[39m | |
[2023-12-21 21:24:13,759] [INFO] [axolotl.normalize_config:150] [PID:8260] [RANK:1] GPU memory usage baseline: 0.000GB (+0.858GB misc)[39m | |
[2023-12-21 21:24:13,759] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2023-12-21 21:24:13,777] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) | |
[2023-12-21 21:24:15,971] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2023-12-21 21:24:16,197] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2023-12-21 21:24:16,289] [INFO] [comm.py:637:init_distributed] cdb=None | |
[2023-12-21 21:24:16,289] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl | |
[2023-12-21 21:24:16,351] [INFO] [comm.py:637:init_distributed] cdb=None | |
dP dP dP | |
88 88 88 | |
.d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88 | |
88' `88 `8bd8' 88' `88 88 88' `88 88 88 | |
88. .88 .d88b. 88. .88 88 88. .88 88 88 | |
`88888P8 dP' `dP `88888P' dP `88888P' dP dP | |
[2023-12-21 21:24:16,530] [DEBUG] [axolotl.load_tokenizer:167] [PID:8261] [RANK:2] EOS: 32000 / <|im_end|>[39m | |
[2023-12-21 21:24:16,530] [DEBUG] [axolotl.load_tokenizer:168] [PID:8261] [RANK:2] BOS: 1 / <s>[39m | |
[2023-12-21 21:24:16,530] [DEBUG] [axolotl.load_tokenizer:169] [PID:8261] [RANK:2] PAD: 2 / </s>[39m | |
[2023-12-21 21:24:16,530] [DEBUG] [axolotl.load_tokenizer:170] [PID:8261] [RANK:2] UNK: 0 / <unk>[39m | |
[2023-12-21 21:24:16,532] [DEBUG] [axolotl.load_tokenizer:167] [PID:8259] [RANK:0] EOS: 32000 / <|im_end|>[39m | |
[2023-12-21 21:24:16,532] [DEBUG] [axolotl.load_tokenizer:168] [PID:8259] [RANK:0] BOS: 1 / <s>[39m | |
[2023-12-21 21:24:16,532] [DEBUG] [axolotl.load_tokenizer:169] [PID:8259] [RANK:0] PAD: 2 / </s>[39m | |
[2023-12-21 21:24:16,532] [DEBUG] [axolotl.load_tokenizer:170] [PID:8259] [RANK:0] UNK: 0 / <unk>[39m | |
[2023-12-21 21:24:16,533] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:8259] [RANK:0] Unable to find prepared dataset in last_run_prepared/a7d05eb3f13184aa9249688865626206[39m | |
[2023-12-21 21:24:16,533] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:8259] [RANK:0] Loading raw datasets...[39m | |
[2023-12-21 21:24:16,533] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:8259] [RANK:0] No seed provided, using default seed of 42[39m | |
[2023-12-21 21:24:16,538] [DEBUG] [axolotl.load_tokenizer:167] [PID:8262] [RANK:3] EOS: 32000 / <|im_end|>[39m | |
[2023-12-21 21:24:16,538] [DEBUG] [axolotl.load_tokenizer:168] [PID:8262] [RANK:3] BOS: 1 / <s>[39m | |
[2023-12-21 21:24:16,538] [DEBUG] [axolotl.load_tokenizer:169] [PID:8262] [RANK:3] PAD: 2 / </s>[39m | |
[2023-12-21 21:24:16,538] [DEBUG] [axolotl.load_tokenizer:170] [PID:8262] [RANK:3] UNK: 0 / <unk>[39m | |
[2023-12-21 21:24:16,541] [DEBUG] [axolotl.load_tokenizer:167] [PID:8260] [RANK:1] EOS: 32000 / <|im_end|>[39m | |
[2023-12-21 21:24:16,542] [DEBUG] [axolotl.load_tokenizer:168] [PID:8260] [RANK:1] BOS: 1 / <s>[39m | |
[2023-12-21 21:24:16,542] [DEBUG] [axolotl.load_tokenizer:169] [PID:8260] [RANK:1] PAD: 2 / </s>[39m | |
[2023-12-21 21:24:16,542] [DEBUG] [axolotl.load_tokenizer:170] [PID:8260] [RANK:1] UNK: 0 / <unk>[39m | |
[2023-12-21 21:24:18,957] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:8259] [RANK:0] merging datasets[39m | |
[2023-12-21 21:24:19,025] [INFO] [axolotl.load_tokenized_prepared_datasets:366] [PID:8259] [RANK:0] shuffle merged datasets[39m | |
[2023-12-21 21:24:19,046] [INFO] [axolotl.load_tokenized_prepared_datasets:369] [PID:8259] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/a7d05eb3f13184aa9249688865626206[39m | |
[2023-12-21 21:25:16,667] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:8262] [RANK:3] Unable to find prepared dataset in last_run_prepared/a7d05eb3f13184aa9249688865626206[39m | |
[2023-12-21 21:25:16,667] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:8260] [RANK:1] Unable to find prepared dataset in last_run_prepared/a7d05eb3f13184aa9249688865626206[39m | |
[2023-12-21 21:25:16,667] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:8262] [RANK:3] Loading raw datasets...[39m | |
[2023-12-21 21:25:16,667] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:8261] [RANK:2] Unable to find prepared dataset in last_run_prepared/a7d05eb3f13184aa9249688865626206[39m | |
[2023-12-21 21:25:16,667] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:8260] [RANK:1] Loading raw datasets...[39m | |
[2023-12-21 21:25:16,667] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:8262] [RANK:3] No seed provided, using default seed of 42[39m | |
[2023-12-21 21:25:16,667] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:8261] [RANK:2] Loading raw datasets...[39m | |
[2023-12-21 21:25:16,667] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:8260] [RANK:1] No seed provided, using default seed of 42[39m | |
[2023-12-21 21:25:16,667] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:8261] [RANK:2] No seed provided, using default seed of 42[39m | |
[2023-12-21 21:25:18,851] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:8260] [RANK:1] merging datasets[39m | |
[2023-12-21 21:25:18,888] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:8261] [RANK:2] merging datasets[39m | |
[2023-12-21 21:25:18,927] [INFO] [axolotl.load_tokenized_prepared_datasets:366] [PID:8260] [RANK:1] shuffle merged datasets[39m | |
[2023-12-21 21:25:18,963] [INFO] [axolotl.load_tokenized_prepared_datasets:366] [PID:8261] [RANK:2] shuffle merged datasets[39m | |
[2023-12-21 21:25:19,618] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:8262] [RANK:3] merging datasets[39m | |
[2023-12-21 21:25:19,693] [INFO] [axolotl.load_tokenized_prepared_datasets:366] [PID:8262] [RANK:3] shuffle merged datasets[39m | |
[2023-12-21 21:25:22,209] [DEBUG] [axolotl.log:60] [PID:8259] [RANK:0] total_num_tokens: 577123019[39m | |
[2023-12-21 21:25:29,857] [DEBUG] [axolotl.log:60] [PID:8259] [RANK:0] `total_supervised_tokens: 336627609`[39m | |
[2023-12-21 21:25:38,896] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8259] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:25:38,896] [DEBUG] [axolotl.log:60] [PID:8259] [RANK:0] data_loader_len: 11623[39m | |
[2023-12-21 21:25:43,214] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8260] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:25:43,419] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8262] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:25:43,552] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8261] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:25:43,567] [INFO] [axolotl.log:60] [PID:8259] [RANK:0] sample_packing_eff_est across ranks: [0.9902671575546265, 0.9902671575546265, 0.9903507232666016, 0.9902671575546265][39m | |
[2023-12-21 21:25:43,567] [DEBUG] [axolotl.log:60] [PID:8259] [RANK:0] sample_packing_eff_est: 1.0[39m | |
[2023-12-21 21:25:43,568] [DEBUG] [axolotl.log:60] [PID:8259] [RANK:0] total_num_steps: 8717[39m | |
[2023-12-21 21:25:43,576] [DEBUG] [axolotl.train.log:60] [PID:8259] [RANK:0] loading tokenizer... /workspace/models/Mixtral-8x7B-v0.1[39m | |
[2023-12-21 21:25:43,676] [DEBUG] [axolotl.load_tokenizer:167] [PID:8259] [RANK:0] EOS: 32000 / <|im_end|>[39m | |
[2023-12-21 21:25:43,676] [DEBUG] [axolotl.load_tokenizer:168] [PID:8259] [RANK:0] BOS: 1 / <s>[39m | |
[2023-12-21 21:25:43,676] [DEBUG] [axolotl.load_tokenizer:169] [PID:8259] [RANK:0] PAD: 2 / </s>[39m | |
[2023-12-21 21:25:43,676] [DEBUG] [axolotl.load_tokenizer:170] [PID:8259] [RANK:0] UNK: 0 / <unk>[39m | |
[2023-12-21 21:25:43,676] [DEBUG] [axolotl.train.log:60] [PID:8259] [RANK:0] loading model and peft_config...[39m | |
[2023-12-21 21:25:43,677] [INFO] [axolotl.load_model:262] [PID:8259] [RANK:0] patching with flash attention[39m | |
[2023-12-21 21:25:43,678] [DEBUG] [axolotl.load_tokenizer:167] [PID:8260] [RANK:1] EOS: 32000 / <|im_end|>[39m | |
[2023-12-21 21:25:43,678] [DEBUG] [axolotl.load_tokenizer:168] [PID:8260] [RANK:1] BOS: 1 / <s>[39m | |
[2023-12-21 21:25:43,678] [DEBUG] [axolotl.load_tokenizer:169] [PID:8260] [RANK:1] PAD: 2 / </s>[39m | |
[2023-12-21 21:25:43,678] [DEBUG] [axolotl.load_tokenizer:170] [PID:8260] [RANK:1] UNK: 0 / <unk>[39m | |
[2023-12-21 21:25:43,679] [DEBUG] [axolotl.load_tokenizer:167] [PID:8261] [RANK:2] EOS: 32000 / <|im_end|>[39m | |
[2023-12-21 21:25:43,679] [DEBUG] [axolotl.load_tokenizer:168] [PID:8261] [RANK:2] BOS: 1 / <s>[39m | |
[2023-12-21 21:25:43,679] [DEBUG] [axolotl.load_tokenizer:169] [PID:8261] [RANK:2] PAD: 2 / </s>[39m | |
[2023-12-21 21:25:43,679] [DEBUG] [axolotl.load_tokenizer:170] [PID:8261] [RANK:2] UNK: 0 / <unk>[39m | |
[2023-12-21 21:25:43,680] [INFO] [axolotl.load_model:262] [PID:8260] [RANK:1] patching with flash attention[39m | |
[2023-12-21 21:25:43,680] [INFO] [axolotl.load_model:262] [PID:8261] [RANK:2] patching with flash attention[39m | |
[2023-12-21 21:25:43,680] [DEBUG] [axolotl.load_tokenizer:167] [PID:8262] [RANK:3] EOS: 32000 / <|im_end|>[39m | |
[2023-12-21 21:25:43,680] [DEBUG] [axolotl.load_tokenizer:168] [PID:8262] [RANK:3] BOS: 1 / <s>[39m | |
[2023-12-21 21:25:43,680] [DEBUG] [axolotl.load_tokenizer:169] [PID:8262] [RANK:3] PAD: 2 / </s>[39m | |
[2023-12-21 21:25:43,680] [DEBUG] [axolotl.load_tokenizer:170] [PID:8262] [RANK:3] UNK: 0 / <unk>[39m | |
[2023-12-21 21:25:43,681] [INFO] [axolotl.load_model:262] [PID:8262] [RANK:3] patching with flash attention[39m | |
[2023-12-21 21:26:27,154] [INFO] [axolotl.load_model:505] [PID:8259] [RANK:0] GPU memory usage after model load: 23.792GB (+0.196GB cache, +1.500GB misc)[39m | |
[2023-12-21 21:26:27,159] [INFO] [axolotl.load_model:528] [PID:8259] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training[39m | |
[2023-12-21 21:26:27,171] [INFO] [axolotl.load_model:540] [PID:8259] [RANK:0] converting modules to torch.bfloat16 for flash attention[39m | |
[2023-12-21 21:26:27,201] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:8259] CUDA extension not installed. | |
[2023-12-21 21:26:27,202] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:8259] CUDA extension not installed. | |
[2023-12-21 21:26:27,466] [INFO] [axolotl.load_model:505] [PID:8260] [RANK:1] GPU memory usage after model load: 23.792GB (+0.196GB cache, +1.500GB misc)[39m | |
[2023-12-21 21:26:27,471] [INFO] [axolotl.load_model:528] [PID:8260] [RANK:1] converting PEFT model w/ prepare_model_for_kbit_training[39m | |
[2023-12-21 21:26:27,483] [INFO] [axolotl.load_model:540] [PID:8260] [RANK:1] converting modules to torch.bfloat16 for flash attention[39m | |
[2023-12-21 21:26:27,511] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:8260] CUDA extension not installed. | |
[2023-12-21 21:26:27,512] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:8260] CUDA extension not installed. | |
[2023-12-21 21:26:29,286] [INFO] [axolotl.load_model:505] [PID:8262] [RANK:3] GPU memory usage after model load: 23.792GB (+0.196GB cache, +1.500GB misc)[39m | |
[2023-12-21 21:26:29,290] [INFO] [axolotl.load_model:528] [PID:8262] [RANK:3] converting PEFT model w/ prepare_model_for_kbit_training[39m | |
[2023-12-21 21:26:29,303] [INFO] [axolotl.load_model:540] [PID:8262] [RANK:3] converting modules to torch.bfloat16 for flash attention[39m | |
[2023-12-21 21:26:29,332] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:8262] CUDA extension not installed. | |
[2023-12-21 21:26:29,333] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:8262] CUDA extension not installed. | |
[2023-12-21 21:26:30,823] [INFO] [axolotl.load_model:505] [PID:8261] [RANK:2] GPU memory usage after model load: 23.792GB (+0.196GB cache, +1.500GB misc)[39m | |
[2023-12-21 21:26:30,827] [INFO] [axolotl.load_model:528] [PID:8261] [RANK:2] converting PEFT model w/ prepare_model_for_kbit_training[39m | |
[2023-12-21 21:26:30,840] [INFO] [axolotl.load_model:540] [PID:8261] [RANK:2] converting modules to torch.bfloat16 for flash attention[39m | |
[2023-12-21 21:26:30,869] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:8261] CUDA extension not installed. | |
[2023-12-21 21:26:30,870] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:8261] CUDA extension not installed. | |
trainable params: 742,408,192 || all params: 47,445,217,280 || trainable%: 1.5647692951191392 | |
trainable params: 742,408,192 || all params: 47,445,217,280 || trainable%: 1.5647692951191392 | |
[2023-12-21 21:26:31,049] [INFO] [axolotl.load_model:570] [PID:8259] [RANK:0] GPU memory usage after adapters: 26.093GB (+0.071GB cache, +1.500GB misc)[39m | |
[2023-12-21 21:26:31,058] [INFO] [axolotl.train.log:60] [PID:8259] [RANK:0] Pre-saving adapter config to /workspace/dolphin-2.6-mixtral-8x7b[39m | |
[2023-12-21 21:26:31,061] [INFO] [axolotl.train.log:60] [PID:8259] [RANK:0] Starting trainer...[39m | |
[2023-12-21 21:26:31,092] [INFO] [axolotl.load_model:570] [PID:8260] [RANK:1] GPU memory usage after adapters: 26.093GB (+0.071GB cache, +1.500GB misc)[39m | |
trainable params: 742,408,192 || all params: 47,445,217,280 || trainable%: 1.5647692951191392 | |
[2023-12-21 21:26:32,980] [INFO] [axolotl.load_model:570] [PID:8262] [RANK:3] GPU memory usage after adapters: 26.093GB (+0.071GB cache, +1.500GB misc)[39m | |
[2023-12-21 21:26:34,346] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8260] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:34,403] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8259] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
trainable params: 742,408,192 || all params: 47,445,217,280 || trainable%: 1.5647692951191392 | |
[2023-12-21 21:26:34,533] [INFO] [axolotl.load_model:570] [PID:8261] [RANK:2] GPU memory usage after adapters: 26.093GB (+0.071GB cache, +1.500GB misc)[39m | |
[2023-12-21 21:26:34,781] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8260] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:34,837] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8259] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:35,216] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8260] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:35,271] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8259] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:35,651] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8260] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:35,710] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8259] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:36,177] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8262] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:36,619] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8262] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:37,057] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8262] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:37,499] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8262] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:37,823] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8261] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:38,264] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8261] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:38,704] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8261] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
[2023-12-21 21:26:39,151] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:8261] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 144280754[39m | |
Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination | |
Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination | |
Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination | |
Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination | |
ninja: no work to do. | |
Time to load cpu_adam op: 2.415133237838745 seconds | |
Time to load cpu_adam op: 2.425196647644043 seconds | |
Time to load cpu_adam op: 2.431770086288452 seconds | |
Time to load cpu_adam op: 2.4151530265808105 seconds |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment