-
-
Save wasertech/a7bd3ae2606e143bf70a540972c3314b to your computer and use it in GitHub Desktop.
Try with the old training interface: still hangs...
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
trainer@e4916e93eaab:~/stt$ TF_CUDNN_RESET_RND_GEN_STATE=1 python train.py --show_progressbar true --train_cudnn true --alphabet_config_path /mnt/test_models/alphabet.txt --scorer_path /mnt/lm/kenlm.scorer --feature_cache /tmp/feature_cache --train_files /mnt/extracted/data/cv-fr/clips/train.csv --dev_files /mnt/extracted/data/cv-fr/clips/dev.csv --train_batch_size 32 --dev_batch_size 32 --n_hidden 2048 --epochs 3 --learning_rate 0.0001 --dropout_rate 0.3 --lm_alpha 0.0 --lm_beta 0.0 --log_level=0 --early_stop true --checkpoint_dir /mnt/test2_checkpoints/ | |
Using the top level train.py script is deprecated and will be removed in a future release. Instead use: python -m coqui_stt_training.train | |
I Performing dummy training to check for memory problems. | |
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory). | |
D Session opened. | |
I Loading best validating checkpoint from /mnt/test2_checkpoints/best_dev-1 | |
I Loading variable from checkpoint: beta1_power | |
I Loading variable from checkpoint: beta2_power | |
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel | |
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam | |
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1 | |
I Loading variable from checkpoint: global_step | |
I Loading variable from checkpoint: layer_1/bias | |
I Loading variable from checkpoint: layer_1/bias/Adam | |
I Loading variable from checkpoint: layer_1/bias/Adam_1 | |
I Loading variable from checkpoint: layer_1/weights | |
I Loading variable from checkpoint: layer_1/weights/Adam | |
I Loading variable from checkpoint: layer_1/weights/Adam_1 | |
I Loading variable from checkpoint: layer_2/bias | |
I Loading variable from checkpoint: layer_2/bias/Adam | |
I Loading variable from checkpoint: layer_2/bias/Adam_1 | |
I Loading variable from checkpoint: layer_2/weights | |
I Loading variable from checkpoint: layer_2/weights/Adam | |
I Loading variable from checkpoint: layer_2/weights/Adam_1 | |
I Loading variable from checkpoint: layer_3/bias | |
I Loading variable from checkpoint: layer_3/bias/Adam | |
I Loading variable from checkpoint: layer_3/bias/Adam_1 | |
I Loading variable from checkpoint: layer_3/weights | |
I Loading variable from checkpoint: layer_3/weights/Adam | |
I Loading variable from checkpoint: layer_3/weights/Adam_1 | |
I Loading variable from checkpoint: layer_5/bias | |
I Loading variable from checkpoint: layer_5/bias/Adam | |
I Loading variable from checkpoint: layer_5/bias/Adam_1 | |
I Loading variable from checkpoint: layer_5/weights | |
I Loading variable from checkpoint: layer_5/weights/Adam | |
I Loading variable from checkpoint: layer_5/weights/Adam_1 | |
I Loading variable from checkpoint: layer_6/bias | |
I Loading variable from checkpoint: layer_6/bias/Adam | |
I Loading variable from checkpoint: layer_6/bias/Adam_1 | |
I Loading variable from checkpoint: layer_6/weights | |
I Loading variable from checkpoint: layer_6/weights/Adam | |
I Loading variable from checkpoint: layer_6/weights/Adam_1 | |
I Loading variable from checkpoint: learning_rate | |
I STARTING Optimization | |
Epoch 0 | Training | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 1073.532959 | |
Epoch 0 | Validation | Elapsed Time: 0:00:01 | Steps: 1 | Loss: 351.812256 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv | |
-------------------------------------------------------------------------------- | |
I FINISHED optimization in 0:00:05.373691 | |
D Session closed. | |
I Dummy run finished without problems, now starting real training process. | |
D Session opened. | |
I STARTING Optimization | |
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 1073.532959 | |
Epoch 0 | Validation | Elapsed Time: 0:00:40 | Steps: 248 | Loss: 270.871342 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv | |
I Saved new best validating model with loss 270.871342 to: /mnt/test2_checkpoints/best_dev-2 | |
-------------------------------------------------------------------------------- | |
Epoch 1 | Training | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 414.540070 | |
Epoch 1 | Validation | Elapsed Time: 0:00:35 | Steps: 248 | Loss: 390.220405 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv | |
-------------------------------------------------------------------------------- | |
Epoch 2 | Training | Elapsed Time: 0:00:00 | Steps: 1 | Loss: 477.380676 | |
Epoch 2 | Validation | Elapsed Time: 0:00:35 | Steps: 248 | Loss: 440.691332 | Dataset: /mnt/extracted/data/cv-fr/clips/dev.csv | |
-------------------------------------------------------------------------------- | |
I FINISHED optimization in 0:01:56.464317 | |
D Session closed. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
it still hangs!