This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ~~epoch hours top1Accuracy | |
| Distributed: init_process_group success | |
| Loaded model | |
| Defined loss and optimizer | |
| Created data loaders | |
| Begin training | |
| Changing LR from None to 1.4 | |
| ~~0 0.01851892861111111 14.500 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ~~epoch hours top1Accuracy | |
| Distributed: init_process_group success | |
| Loaded model | |
| Defined loss and optimizer | |
| Created data loaders | |
| Begin training | |
| Begin training loop: 1530911465.1864338 | |
| Prefetcher first preload complete | |
| Received input: 3.8542351722717285 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Test: [0/391] Time 1.912 (1.912) Loss 1.3594 (1.3594) Prec@1 67.188 (67.188) Prec@5 86.719 (86.719) | |
| Test: [10/391] Time 0.088 (0.400) Loss 1.0459 (0.9636) Prec@1 79.688 (75.000) Prec@5 89.062 (92.259) | |
| Test: [20/391] Time 0.088 (0.392) Loss 0.9121 (1.0274) Prec@1 75.000 (73.772) Prec@5 95.312 (91.592) | |
| Test: [30/391] Time 0.088 (0.350) Loss 0.8262 (1.0025) Prec@1 82.031 (74.320) Prec@5 93.750 (92.087) | |
| Test: [40/391] Time 0.088 (0.357) Loss 1.0703 (0.9653) Prec@1 71.094 (75.305) Prec@5 92.969 (92.530) | |
| Test: [50/391] Time 0.090 (0.337) Loss 1.2402 (1.0169) Prec@1 69.531 (74.357) Prec@5 92.969 (91.881) | |
| Test: [60/391] Time 0.088 (0.340) Loss 1.7568 (1.0623) Prec@1 54.688 (73.335) Prec@5 83.594 (91.304) | |
| Test: [70/391] Time 0.088 (0.331) Loss 1.1191 (1.0536) Prec@1 74.219 (73.537) Prec@5 89.844 (91.384) | |
| Test: [80/391] Time 0.088 (0.331) Loss 0.9688 (1.0258) Prec@1 75.000 (74.199) Prec@5 90.625 (91.763) | |
| Test: [90/391] Time 0.088 (0.324) Loss 1.0059 (1.0122) Prec@1 73.438 (74.511) Prec@5 93.750 (92.033) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Test: [0/391] Time 2.462 (2.462) Loss 1.0381 (1.0381) Prec@1 73.438 (73.438) Prec@5 91.406 (91.406) | |
| Test: [10/391] Time 0.138 (0.494) Loss 0.9663 (0.8747) Prec@1 79.688 (77.202) Prec@5 92.188 (94.247) | |
| Test: [20/391] Time 0.123 (0.463) Loss 0.9185 (0.9419) Prec@1 78.125 (75.930) Prec@5 94.531 (93.824) | |
| Test: [30/391] Time 0.121 (0.408) Loss 0.7988 (0.9348) Prec@1 85.938 (76.260) Prec@5 92.969 (93.800) | |
| Test: [40/391] Time 0.133 (0.411) Loss 1.0264 (0.9115) Prec@1 73.438 (77.115) Prec@5 93.750 (94.074) | |
| Test: [50/391] Time 0.113 (0.387) Loss 1.1367 (0.9567) Prec@1 73.438 (76.149) Prec@5 91.406 (93.367) | |
| Test: [60/391] Time 0.113 (0.386) Loss 1.6260 (0.9970) Prec@1 56.250 (75.128) Prec@5 85.938 (92.841) | |
| Test: [70/391] Time 0.113 (0.373) Loss 1.0781 (0.9921) Prec@1 73.438 (75.253) Prec@5 92.969 (92.848) | |
| Test: [80/391] Time 0.105 (0.371) Loss 0.8721 (0.9677) Prec@1 76.562 (75.791) Prec@5 93.750 (93.142) | |
| Test: [90/391] Time 0.109 (0.361) Loss 0.8960 (0.9565) Prec@1 78.125 (75.953) Prec@5 96.094 (93.286) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # creating snapshot | |
| v = #volume | |
| snapshot = ec2.create_snapshot( | |
| Description='Imagenet data snapshot', | |
| VolumeId=v.id, | |
| TagSpecifications=[ | |
| { | |
| 'ResourceType': 'snapshot', | |
| 'Tags': [ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| # This assumes base DLAMI - "Deep Learning AMI (Ubuntu) Version 12.0" | |
| # YOU MUST RUN THESE COMMANDS BEFORE YOU RUN THIS SCRIPT | |
| # conda create -n pytorch_source -y | |
| # source activate pytorch_source | |
| sudo rm -rf /usr/local/cuda |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| NCCL_RINGS="8 21 18 14 28 6 13 20 3 24 10 16 5 1 30 17 11 27 0 19 15 9 7 12 4 23 29 22 2 26 25 31 | 14 18 24 12 30 22 0 29 25 5 1 10 9 2 4 23 20 11 16 7 27 15 31 3 26 17 6 8 28 19 21 13 | 31 27 4 18 25 23 6 7 13 28 22 2 12 21 20 15 3 30 1 5 16 14 19 8 10 26 9 11 29 24 0 17 | 5 10 24 1 14 21 7 28 3 4 25 11 8 29 13 20 27 26 17 12 6 0 30 2 15 16 18 23 9 22 19 31" | |
| Changing LR from 2.188196721311475 to 2.1901597577529497 | |
| Epoch: [1][10/157] Time 0.381 (0.675) Data 0.001 (0.019) Loss 5.2887 (5.2825) Prec@1 7.458 (7.638) Prec@5 20.349 (20.150) bw 2.941 2.941 | |
| Epoch: [1][20/157] Time 0.379 (0.540) Data 0.001 (0.018) Loss 5.1616 (5.2322) Prec@1 9.119 (8.117) Prec@5 21.826 (20.883) bw 12.484 12.484 | |
| Epoch: [1][30/157] Time 0.380 (0.493) Data 0.001 (0.020) Loss 5.0941 (5.2052) Prec@1 9.253 (8.359) Prec@5 23.938 (21.450) bw 13.183 13.183 | |
| Epoch: [1][40/157] Time 0.381 (0.470) Data 0.001 (0.020) Loss 5.0707 (5.1734) Prec@1 9.363 (8.611) Pr |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Namespace(arch='resnet50', batch_sched='512,192,128', data='/home/ubuntu/data/imagenet', dist_backend='nccl', dist_url='file:///home/ubuntu/data/file.sync', distributed=True, epochs=35, evaluate=False, fp16=True, init_bn0=True, local_rank=2, logdir='/efs/runs/one_machine_e35_nobnwd.03', loss_scale=1024.0, lr=1.0, lr_linear_scale=True, lr_sched='0.14,0.47,0.78,0.95', momentum=0.9, no_bn_wd=True, pretrained=False, print_freq=10, prof=False, resize_sched='0.4,0.92', resume='', save_dir='/home/ubuntu/data/training/nv/2018-08-01_22-38-one_machine_e35_nobnwd-w8', start_epoch=0, val_ar=True, weight_decay=0.0001, workers=8, world_size=8) | |
| ~~epoch hours top1Accuracy | |
| Distributed: initializing process group | |
| Distributed: success (2/8) | |
| Loading model | |
| Creating data loaders (this could take 6-12 minutes) | |
| Begin training | |
| Dataset changed. | |
| Image size: 128 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import argparse | |
| import os | |
| import shutil | |
| import time | |
| import torch | |
| import torch.nn as nn | |
| import torch.nn.parallel | |
| import torch.backends.cudnn as cudnn | |
| import torch.distributed as dist |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import torch.nn as nn | |
| import math | |
| import torch.utils.model_zoo as model_zoo | |
| __all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', | |
| 'resnet152'] | |
| model_urls = { |