Last active
September 22, 2022 04:25
-
-
Save davidberard98/1d9302384b64e46943ad997aaaa54adb to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[I debug.cpp:49] [c10d] The debug level is set to DETAIL. | |
[I ProcessGroupNCCL.cpp:835] [Rank 0] NCCL watchdog thread started! | |
[I ProcessGroupNCCL.cpp:669] [Rank 0] ProcessGroupNCCL initialized with following options: | |
NCCL_ASYNC_ERROR_HANDLING: -2 | |
NCCL_DESYNC_DEBUG: 1 | |
NCCL_BLOCKING_WAIT: 0 | |
TIMEOUT(ms): 1800000 | |
USE_HIGH_PRIORITY_STREAM: 0 | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead. | |
warnings.warn( | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights. | |
warnings.warn(msg) | |
[I ProcessGroupNCCL.cpp:1274] NCCL_DEBUG: TRACE | |
[I reducer.cpp:126] Reducer initialized with bucket_bytes_cap: 26214400 first_bucket_bytes_cap: 1048576 | |
[I logger.cpp:213] [Rank 0]: DDP Initialized with: | |
broadcast_buffers: 0 | |
bucket_cap_bytes: 26214400 | |
find_unused_parameters: 0 | |
gradient_as_bucket_view: 1 | |
has_sync_bn: 0 | |
is_multi_device_module: 0 | |
iteration: 0 | |
num_parameter_tensors: 161 | |
output_device: 0 | |
rank: 0 | |
total_parameter_size_bytes: 102228128 | |
world_size: 16 | |
backend_name: nccl | |
bucket_sizes: 102228128 | |
cuda_visible_devices: 0,1,2,3,4,5,6,7 | |
device_ids: 0 | |
dtypes: float | |
master_addr: N/A | |
master_port: N/A | |
module_name: ResNet | |
nccl_async_error_handling: N/A | |
nccl_blocking_wait: N/A | |
nccl_debug: TRACE | |
nccl_ib_timeout: N/A | |
nccl_nthreads: N/A | |
nccl_socket_ifname: ens | |
torch_distributed_debug: DETAIL | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 1] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 14610432 | |
Avg backward compute time: 380300288 | |
Avg backward comm. time: 127566848 | |
Avg backward comm/comp overlap time: 21514240 | |
[I reducer.cpp:1724] 5 buckets rebuilt with size limits: 1048576, 26214400, 26214400, 26214400, 26214400 bytes. | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 2] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 81418752 | |
Avg backward compute time: 239553024 | |
Avg backward comm. time: 128046592 | |
Avg backward comm/comp overlap time: 58981888 | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 3] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 65623040 | |
Avg backward compute time: 192718848 | |
Avg backward comm. time: 128496984 | |
Avg backward comm/comp overlap time: 71587157 | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 4] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 57681408 | |
Avg backward compute time: 169098496 | |
Avg backward comm. time: 125390338 | |
Avg backward comm/comp overlap time: 77677311 | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 5] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 52917248 | |
Avg backward compute time: 154995916 | |
Avg backward comm. time: 126040270 | |
Avg backward comm/comp overlap time: 81394892 | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 6] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 49737216 | |
Avg backward compute time: 473799487 | |
Avg backward comm. time: 452736513 | |
Avg backward comm/comp overlap time: 412090708 | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 7] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 47497508 | |
Avg backward compute time: 420205110 | |
Avg backward comm. time: 407578771 | |
Avg backward comm/comp overlap time: 366987994 | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 8] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 45786239 | |
Avg backward compute time: 380060911 | |
Avg backward comm. time: 371972352 | |
Avg backward comm/comp overlap time: 333205502 | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 9] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 44459917 | |
Avg backward compute time: 349120496 | |
Avg backward comm. time: 345771121 | |
Avg backward comm/comp overlap time: 307226508 | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 10] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 43406334 | |
Avg backward compute time: 324065982 | |
Avg backward comm. time: 323487845 | |
Avg backward comm/comp overlap time: 286128945 | |
[I ProcessGroupNCCL.cpp:837] [Rank 0] NCCL watchdog thread terminated normally |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[I debug.cpp:49] [c10d] The debug level is set to DETAIL. | |
[I ProcessGroupNCCL.cpp:669] [Rank 0] ProcessGroupNCCL initialized with following options: | |
NCCL_ASYNC_ERROR_HANDLING: -2 | |
NCCL_DESYNC_DEBUG: 1 | |
NCCL_BLOCKING_WAIT: 0 | |
TIMEOUT(ms): 1800000 | |
USE_HIGH_PRIORITY_STREAM: 0 | |
[I ProcessGroupNCCL.cpp:835] [Rank 0] NCCL watchdog thread started! | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead. | |
warnings.warn( | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights. | |
warnings.warn(msg) | |
[I ProcessGroupNCCL.cpp:1274] NCCL_DEBUG: TRACE | |
[I reducer.cpp:126] Reducer initialized with bucket_bytes_cap: 26214400 first_bucket_bytes_cap: 1048576 | |
[I logger.cpp:213] [Rank 0]: DDP Initialized with: | |
broadcast_buffers: 0 | |
bucket_cap_bytes: 26214400 | |
find_unused_parameters: 0 | |
gradient_as_bucket_view: 1 | |
has_sync_bn: 0 | |
is_multi_device_module: 0 | |
iteration: 0 | |
num_parameter_tensors: 161 | |
output_device: 0 | |
rank: 0 | |
total_parameter_size_bytes: 102228128 | |
world_size: 16 | |
backend_name: nccl | |
bucket_sizes: 102228128 | |
cuda_visible_devices: 0,1,2,3,4,5,6,7 | |
device_ids: 0 | |
dtypes: float | |
master_addr: N/A | |
master_port: N/A | |
module_name: ResNet | |
nccl_async_error_handling: N/A | |
nccl_blocking_wait: N/A | |
nccl_debug: TRACE | |
nccl_ib_timeout: N/A | |
nccl_nthreads: N/A | |
nccl_socket_ifname: ens | |
torch_distributed_debug: DETAIL | |
[I reducer.cpp:126] Reducer initialized with bucket_bytes_cap: 26214400 first_bucket_bytes_cap: 1048576 | |
[I logger.cpp:213] [Rank 0]: DDP Initialized with: | |
broadcast_buffers: 0 | |
bucket_cap_bytes: 26214400 | |
find_unused_parameters: 0 | |
gradient_as_bucket_view: 1 | |
has_sync_bn: 0 | |
is_multi_device_module: 0 | |
iteration: 0 | |
num_parameter_tensors: 161 | |
output_device: 0 | |
rank: 0 | |
total_parameter_size_bytes: 102228128 | |
world_size: 16 | |
backend_name: nccl | |
bucket_sizes: 102228128 | |
cuda_visible_devices: 0,1,2,3,4,5,6,7 | |
device_ids: 0 | |
dtypes: float | |
master_addr: N/A | |
master_port: N/A | |
module_name: ResNet | |
nccl_async_error_handling: N/A | |
nccl_blocking_wait: N/A | |
nccl_debug: TRACE | |
nccl_ib_timeout: N/A | |
nccl_nthreads: N/A | |
nccl_socket_ifname: ens | |
torch_distributed_debug: DETAIL | |
[I logger.cpp:377] [Rank 0 / 16] [before iteration 1] Training ResNet unused_parameter_size=0 | |
Avg forward compute time: 202930176 | |
Avg backward compute time: 0 | |
Avg backward comm. time: 0 | |
Avg backward comm/comp overlap time: 0 | |
INFO:submitit:Job completed successfully |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment