Created
July 22, 2020 21:30
-
-
Save taylanbil/aea9fcea4cac051fbbd3d490d7e4782c to your computer and use it in GitHub Desktop.
gpu vs tpu comparison for kkissmat
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
On commit 1f8ccaaf71b15f22e447866233e7d5e395928cab | |
# GPU COMMAND - 8 gpus | |
```bash | |
python /home/taylanbil/kkissmart-fairseq/tpu_fairseq/train.py $FULLDATAPATH --encoder-normalize-before --decoder-normalize-before --arch mbart_base --layernorm-embedding --task multilingual_denoising --criterion cross_entropy --dataset-impl mmap --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' --lr-scheduler polynomial_decay --lr 1e-04 --min-lr -1 --warmup-updates 0 --total-num-update 500000 --dropout 0.0 --attention-dropout 0.0 --weight-decay 0.0 --max-tokens 4104 --seed 2 --log-format simple --log-interval 100 --add-lang-token --no-whole-word-mask-langs IMG --mask 0.35 --permute-sentences 1.0 --mask-length span-poisson --replace-length 1 --rotate 0.0 --max-source-positions 1026 --max-target-positions 1026 --tokens-per-sample 1026 --sample-break-mode complete --save-interval-updates 500 --skip-invalid-size-inputs-valid-test --langs EN,IMG --no-bos --no-input-eos --multilang-sampling-alpha 0.5 --max-sentences 4 --no-save --fp16 --num-buckets 1 | |
``` | |
# TPU COMMAND - v3-8 | |
```bash | |
python /home/taylanbil/kkissmart-fairseq/tpu_fairseq/train.py $FULLDATAPATH --encoder-normalize-before --decoder-normalize-before --arch mbart_base --layernorm-embedding --task multilingual_denoising --criterion cross_entropy --dataset-impl mmap --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' --lr-scheduler polynomial_decay --lr 1e-04 --min-lr -1 --warmup-updates 0 --total-num-update 500000 --dropout 0.0 --attention-dropout 0.0 --weight-decay 0.0 --max-tokens 4104 --seed 2 --log-format simple --log-interval 100 --add-lang-token --no-whole-word-mask-langs IMG --mask 0.35 --permute-sentences 1.0 --mask-length span-poisson --replace-length 1 --rotate 0.0 --max-source-positions 1026 --max-target-positions 1026 --tokens-per-sample 1026 --sample-break-mode complete --save-interval-updates 500 --skip-invalid-size-inputs-valid-test --langs EN,IMG --no-bos --no-input-eos --multilang-sampling-alpha 0.5 --max-sentences 4 --no-save --tpu --num-buckets 1 --distributed-world-size 8 | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
2020-07-22 21:50:41 | INFO | fairseq.optim.adam | using FusedAdam
2020-07-22 21:50:59 | INFO | train_inner | epoch 001: 100 / 5904173 loss=11.752, ppl=3448.72, wps=7067.4, ups=7.49, wpb=943.8, bsz=1, num_updates=100, lr=9.998e-05, gnorm=2.073, loss_scale=128, train_wall=14, wall=105
2020-07-22 21:51:12 | INFO | train_inner | epoch 001: 200 / 5904173 loss=10.529, ppl=1477.85, wps=7180.3, ups=7.64, wpb=940.3, bsz=1, num_updates=200, lr=9.996e-05, gnorm=1.817, loss_scale=128, train_wall=13, wall=118
2020-07-22 21:51:25 | INFO | train_inner | epoch 001: 300 / 5904173 loss=10.09, ppl=1090.27, wps=7049.6, ups=7.59, wpb=928.8, bsz=1, num_updates=300, lr=9.994e-05, gnorm=1.733, loss_scale=128, train_wall=13, wall=131
2020-07-22 21:51:38 | INFO | train_inner | epoch 001: 400 / 5904173 loss=9.999, ppl=1023.5, wps=7048.5, ups=7.54, wpb=934.8, bsz=1, num_updates=400, lr=9.992e-05, gnorm=1.746, loss_scale=128, train_wall=13, wall=145
2020-07-22 21:51:52 | INFO | train_inner | epoch 001: 500 / 5904173 loss=9.633, ppl=793.85, wps=6953.3, ups=7.51, wpb=925.6, bsz=1, num_updates=500, lr=9.99e-05, gnorm=1.718, loss_scale=128, train_wall=13, wall=158
2020-07-22 22:30:37 | INFO | fairseq.data.iterators | Data loading buffer is empty or nearly empty. This may indicate a data loading bottleneck, and increasing the number of workers (--num-workers) may help.
2020-07-22 22:30:37 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 15.503 | ppl 46431.2 | wps 24473.9 | wpb 1015 | bsz 1 | num_updates 500
2020-07-22 22:34:53 | INFO | valid_EN | epoch 001 | valid on 'valid_EN' subset | loss 9.565 | ppl 757.47 | wps 22327.5 | wpb 933.6 | bsz 1 | num_updates 500
2020-07-22 21:57:18 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:57:36 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:57:55 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:58:20 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:58:30 | INFO | train_inner | epoch 001: 100 / 5904173 loss=11.444, ppl=2785.4, wps=0, ups=0, wpb=964, bsz=1, num_updates=100, lr=9.998e-05, gnorm=1.653, train_wall=81, wall=276
2020-07-22 21:58:30 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:58:50 | INFO | train_inner | epoch 001: 200 / 5904173 loss=10.332, ppl=1288.97, wps=50, ups=0.05, wpb=992, bsz=1, num_updates=200, lr=9.996e-05, gnorm=1.7, train_wall=11, wall=296
2020-07-22 21:58:50 | INFO | root | NOTE: XLA compilation detected; too many of these can lead to slow training, but we expect a few in the beginning
2020-07-22 21:59:10 | INFO | train_inner | epoch 001: 300 / 5904173 loss=10.399, ppl=1350.54, wps=46.1, ups=0.05, wpb=925, bsz=1, num_updates=300, lr=9.994e-05, gnorm=1.88, train_wall=12, wall=316
2020-07-22 21:59:30 | INFO | train_inner | epoch 001: 400 / 5904173 loss=10.276, ppl=1239.59, wps=46.2, ups=0.05, wpb=923, bsz=1, num_updates=400, lr=9.992e-05, gnorm=2.283, train_wall=12, wall=336
2020-07-22 21:59:50 | INFO | train_inner | epoch 001: 500 / 5904173 loss=9.604, ppl=778.11, wps=35.7, ups=0.05, wpb=718, bsz=1, num_updates=500, lr=9.99e-05, gnorm=1.836, train_wall=12, wall=356
/home/xwu/tpu_fairseq/fairseq/data/denoising_dataset.py:296: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
word_starts = is_word_start.nonzero()
2020-07-22 22:51:27 | INFO | fairseq.data.iterators | Data loading buffer is empty or nearly empty. This may indicate a data loading bottleneck, and increasing the number of workers (--num-workers) may help.
2020-07-22 22:51:28 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 15.272 | ppl 39554.8 | wps 18371.1 | wpb 1015 | bsz 1 | num_updates 500
/home/xwu/tpu_fairseq/fairseq/data/denoising_dataset.py:296: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
word_starts = is_word_start.nonzero()
2020-07-22 22:57:27 | INFO | fairseq.data.iterators | Data loading buffer is empty or nearly empty. This may indicate a data loading bottleneck, and increasing the number of workers (--num-workers) may help.
2020-07-22 22:57:28 | INFO | valid_EN | epoch 001 | valid on 'valid_EN' subset | loss 9.564 | ppl 756.97 | wps 15873.9 | wpb 933.6 | bsz 1 | num_updates 500
/home/xwu/tpu_fairseq/fairseq/data/denoising_dataset.py:296: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
word_starts = is_word_start.nonzero()