davidberard98 · September 22, 2022 04:25
diff --git a/resnet50_eager.err b/resnet50_eager.err
 [I debug.cpp:49] [c10d] The debug level is set to DETAIL.
 [I ProcessGroupNCCL.cpp:835] [Rank 0] NCCL watchdog thread started!
 [I ProcessGroupNCCL.cpp:669] [Rank 0] ProcessGroupNCCL initialized with following options:
 NCCL_ASYNC_ERROR_HANDLING: -2
 NCCL_DESYNC_DEBUG: 1
 NCCL_BLOCKING_WAIT: 0
 TIMEOUT(ms): 1800000
 USE_HIGH_PRIORITY_STREAM: 0
 /data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
  warnings.warn(
 /data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
 [I ProcessGroupNCCL.cpp:1274] NCCL_DEBUG: TRACE
 [I reducer.cpp:126] Reducer initialized with bucket_bytes_cap: 26214400 first_bucket_bytes_cap: 1048576
 [I logger.cpp:213] [Rank 0]: DDP Initialized with: 
 broadcast_buffers: 0
 bucket_cap_bytes: 26214400
 find_unused_parameters: 0
 gradient_as_bucket_view: 1
 has_sync_bn: 0
 is_multi_device_module: 0
 iteration: 0
 num_parameter_tensors: 161
 output_device: 0
 rank: 0
 total_parameter_size_bytes: 102228128
 world_size: 16
 backend_name: nccl
 bucket_sizes: 102228128
 cuda_visible_devices: 0,1,2,3,4,5,6,7
 device_ids: 0
 dtypes: float
 master_addr: N/A
 master_port: N/A
 module_name: ResNet
 nccl_async_error_handling: N/A
 nccl_blocking_wait: N/A
 nccl_debug: TRACE
 nccl_ib_timeout: N/A
 nccl_nthreads: N/A
 nccl_socket_ifname: ens
 torch_distributed_debug: DETAIL

 [I logger.cpp:377] [Rank 0 / 16] [before iteration 1] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 14610432 
 Avg backward compute time: 380300288 
 Avg backward comm. time: 127566848 
 Avg backward comm/comp overlap time: 21514240
 [I reducer.cpp:1724] 5 buckets rebuilt with size limits: 1048576, 26214400, 26214400, 26214400, 26214400 bytes.
 [I logger.cpp:377] [Rank 0 / 16] [before iteration 2] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 81418752 
 Avg backward compute time: 239553024 
 Avg backward comm. time: 128046592 
 Avg backward comm/comp overlap time: 58981888
 [I logger.cpp:377] [Rank 0 / 16] [before iteration 3] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 65623040 
 Avg backward compute time: 192718848 
 Avg backward comm. time: 128496984 
 Avg backward comm/comp overlap time: 71587157
 [I logger.cpp:377] [Rank 0 / 16] [before iteration 4] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 57681408 
 Avg backward compute time: 169098496 
 Avg backward comm. time: 125390338 
 Avg backward comm/comp overlap time: 77677311
 [I logger.cpp:377] [Rank 0 / 16] [before iteration 5] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 52917248 
 Avg backward compute time: 154995916 
 Avg backward comm. time: 126040270 
 Avg backward comm/comp overlap time: 81394892
 [I logger.cpp:377] [Rank 0 / 16] [before iteration 6] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 49737216 
 Avg backward compute time: 473799487 
 Avg backward comm. time: 452736513 
 Avg backward comm/comp overlap time: 412090708
 [I logger.cpp:377] [Rank 0 / 16] [before iteration 7] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 47497508 
 Avg backward compute time: 420205110 
 Avg backward comm. time: 407578771 
 Avg backward comm/comp overlap time: 366987994
 [I logger.cpp:377] [Rank 0 / 16] [before iteration 8] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 45786239 
 Avg backward compute time: 380060911 
 Avg backward comm. time: 371972352 
 Avg backward comm/comp overlap time: 333205502
 [I logger.cpp:377] [Rank 0 / 16] [before iteration 9] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 44459917 
 Avg backward compute time: 349120496 
 Avg backward comm. time: 345771121 
 Avg backward comm/comp overlap time: 307226508
 [I logger.cpp:377] [Rank 0 / 16] [before iteration 10] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 43406334 
 Avg backward compute time: 324065982 
 Avg backward comm. time: 323487845 
 Avg backward comm/comp overlap time: 286128945
 [I ProcessGroupNCCL.cpp:837] [Rank 0] NCCL watchdog thread terminated normally
diff --git a/resnet50_inductor.err b/resnet50_inductor.err
 [I debug.cpp:49] [c10d] The debug level is set to DETAIL.
 [I ProcessGroupNCCL.cpp:669] [Rank 0] ProcessGroupNCCL initialized with following options:
 NCCL_ASYNC_ERROR_HANDLING: -2
 NCCL_DESYNC_DEBUG: 1
 NCCL_BLOCKING_WAIT: 0
 TIMEOUT(ms): 1800000
 USE_HIGH_PRIORITY_STREAM: 0
 [I ProcessGroupNCCL.cpp:835] [Rank 0] NCCL watchdog thread started!
 /data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
  warnings.warn(
 /data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
 [I ProcessGroupNCCL.cpp:1274] NCCL_DEBUG: TRACE
 [I reducer.cpp:126] Reducer initialized with bucket_bytes_cap: 26214400 first_bucket_bytes_cap: 1048576
 [I logger.cpp:213] [Rank 0]: DDP Initialized with: 
 broadcast_buffers: 0
 bucket_cap_bytes: 26214400
 find_unused_parameters: 0
 gradient_as_bucket_view: 1
 has_sync_bn: 0
 is_multi_device_module: 0
 iteration: 0
 num_parameter_tensors: 161
 output_device: 0
 rank: 0
 total_parameter_size_bytes: 102228128
 world_size: 16
 backend_name: nccl
 bucket_sizes: 102228128
 cuda_visible_devices: 0,1,2,3,4,5,6,7
 device_ids: 0
 dtypes: float
 master_addr: N/A
 master_port: N/A
 module_name: ResNet
 nccl_async_error_handling: N/A
 nccl_blocking_wait: N/A
 nccl_debug: TRACE
 nccl_ib_timeout: N/A
 nccl_nthreads: N/A
 nccl_socket_ifname: ens
 torch_distributed_debug: DETAIL

 [I reducer.cpp:126] Reducer initialized with bucket_bytes_cap: 26214400 first_bucket_bytes_cap: 1048576
 [I logger.cpp:213] [Rank 0]: DDP Initialized with: 
 broadcast_buffers: 0
 bucket_cap_bytes: 26214400
 find_unused_parameters: 0
 gradient_as_bucket_view: 1
 has_sync_bn: 0
 is_multi_device_module: 0
 iteration: 0
 num_parameter_tensors: 161
 output_device: 0
 rank: 0
 total_parameter_size_bytes: 102228128
 world_size: 16
 backend_name: nccl
 bucket_sizes: 102228128
 cuda_visible_devices: 0,1,2,3,4,5,6,7
 device_ids: 0
 dtypes: float
 master_addr: N/A
 master_port: N/A
 module_name: ResNet
 nccl_async_error_handling: N/A
 nccl_blocking_wait: N/A
 nccl_debug: TRACE
 nccl_ib_timeout: N/A
 nccl_nthreads: N/A
 nccl_socket_ifname: ens
 torch_distributed_debug: DETAIL

 [I logger.cpp:377] [Rank 0 / 16] [before iteration 1] Training ResNet unused_parameter_size=0 
 Avg forward compute time: 202930176 
 Avg backward compute time: 0 
 Avg backward comm. time: 0 
 Avg backward comm/comp overlap time: 0
 INFO:submitit:Job completed successfully
	[I debug.cpp:49] [c10d] The debug level is set to DETAIL.
	[I ProcessGroupNCCL.cpp:835] [Rank 0] NCCL watchdog thread started!
	[I ProcessGroupNCCL.cpp:669] [Rank 0] ProcessGroupNCCL initialized with following options:
	NCCL_ASYNC_ERROR_HANDLING: -2
	NCCL_DESYNC_DEBUG: 1
	NCCL_BLOCKING_WAIT: 0
	TIMEOUT(ms): 1800000
	USE_HIGH_PRIORITY_STREAM: 0
	/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
	warnings.warn(
	/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
	warnings.warn(msg)
	[I ProcessGroupNCCL.cpp:1274] NCCL_DEBUG: TRACE
	[I reducer.cpp:126] Reducer initialized with bucket_bytes_cap: 26214400 first_bucket_bytes_cap: 1048576
	[I logger.cpp:213] [Rank 0]: DDP Initialized with:
	broadcast_buffers: 0
	bucket_cap_bytes: 26214400
	find_unused_parameters: 0
	gradient_as_bucket_view: 1
	has_sync_bn: 0
	is_multi_device_module: 0
	iteration: 0
	num_parameter_tensors: 161
	output_device: 0
	rank: 0
	total_parameter_size_bytes: 102228128
	world_size: 16
	backend_name: nccl
	bucket_sizes: 102228128
	cuda_visible_devices: 0,1,2,3,4,5,6,7
	device_ids: 0
	dtypes: float
	master_addr: N/A
	master_port: N/A
	module_name: ResNet
	nccl_async_error_handling: N/A
	nccl_blocking_wait: N/A
	nccl_debug: TRACE
	nccl_ib_timeout: N/A
	nccl_nthreads: N/A
	nccl_socket_ifname: ens
	torch_distributed_debug: DETAIL

	[I logger.cpp:377] [Rank 0 / 16] [before iteration 1] Training ResNet unused_parameter_size=0
	Avg forward compute time: 14610432
	Avg backward compute time: 380300288
	Avg backward comm. time: 127566848
	Avg backward comm/comp overlap time: 21514240
	[I reducer.cpp:1724] 5 buckets rebuilt with size limits: 1048576, 26214400, 26214400, 26214400, 26214400 bytes.
	[I logger.cpp:377] [Rank 0 / 16] [before iteration 2] Training ResNet unused_parameter_size=0
	Avg forward compute time: 81418752
	Avg backward compute time: 239553024
	Avg backward comm. time: 128046592
	Avg backward comm/comp overlap time: 58981888
	[I logger.cpp:377] [Rank 0 / 16] [before iteration 3] Training ResNet unused_parameter_size=0
	Avg forward compute time: 65623040
	Avg backward compute time: 192718848
	Avg backward comm. time: 128496984
	Avg backward comm/comp overlap time: 71587157
	[I logger.cpp:377] [Rank 0 / 16] [before iteration 4] Training ResNet unused_parameter_size=0
	Avg forward compute time: 57681408
	Avg backward compute time: 169098496
	Avg backward comm. time: 125390338
	Avg backward comm/comp overlap time: 77677311
	[I logger.cpp:377] [Rank 0 / 16] [before iteration 5] Training ResNet unused_parameter_size=0
	Avg forward compute time: 52917248
	Avg backward compute time: 154995916
	Avg backward comm. time: 126040270
	Avg backward comm/comp overlap time: 81394892
	[I logger.cpp:377] [Rank 0 / 16] [before iteration 6] Training ResNet unused_parameter_size=0
	Avg forward compute time: 49737216
	Avg backward compute time: 473799487
	Avg backward comm. time: 452736513
	Avg backward comm/comp overlap time: 412090708
	[I logger.cpp:377] [Rank 0 / 16] [before iteration 7] Training ResNet unused_parameter_size=0
	Avg forward compute time: 47497508
	Avg backward compute time: 420205110
	Avg backward comm. time: 407578771
	Avg backward comm/comp overlap time: 366987994
	[I logger.cpp:377] [Rank 0 / 16] [before iteration 8] Training ResNet unused_parameter_size=0
	Avg forward compute time: 45786239
	Avg backward compute time: 380060911
	Avg backward comm. time: 371972352
	Avg backward comm/comp overlap time: 333205502
	[I logger.cpp:377] [Rank 0 / 16] [before iteration 9] Training ResNet unused_parameter_size=0
	Avg forward compute time: 44459917
	Avg backward compute time: 349120496
	Avg backward comm. time: 345771121
	Avg backward comm/comp overlap time: 307226508
	[I logger.cpp:377] [Rank 0 / 16] [before iteration 10] Training ResNet unused_parameter_size=0
	Avg forward compute time: 43406334
	Avg backward compute time: 324065982
	Avg backward comm. time: 323487845
	Avg backward comm/comp overlap time: 286128945
	[I ProcessGroupNCCL.cpp:837] [Rank 0] NCCL watchdog thread terminated normally