Skip to content

Instantly share code, notes, and snippets.

@davidberard98
Last active November 1, 2022 03:21
Show Gist options
  • Save davidberard98/755bc9c396a86de637893e14e8ec0c3b to your computer and use it in GitHub Desktop.
Save davidberard98/755bc9c396a86de637893e14e8ec0c3b to your computer and use it in GitHub Desktop.
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
[W kineto_shim.cpp:330] Profiler is not initialized: skipping step() invocation
STAGE:2022-11-01 01:39:13 3461:3461 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-11-01 01:39:14 3461:3461 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-11-01 01:39:16 3461:3461 output_json.cpp:417] Completed Stage: Post Processing
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
[2022-11-01 01:39:38,741] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:39:40,064] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:39:43,769] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-01 01:39:45,329] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:39:45,577] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:39:45,824] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[W kineto_shim.cpp:330] Profiler is not initialized: skipping step() invocation
STAGE:2022-11-01 01:39:50 4417:4417 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-11-01 01:39:51 4417:4417 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-11-01 01:39:53 4417:4417 output_json.cpp:417] Completed Stage: Post Processing
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
[2022-11-01 01:40:13,727] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:40:15,047] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:40:23,902] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:40:24,284] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:40:24,534] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[W kineto_shim.cpp:330] Profiler is not initialized: skipping step() invocation
STAGE:2022-11-01 01:40:29 5591:5591 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-11-01 01:40:29 5591:5591 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-11-01 01:40:32 5591:5591 output_json.cpp:417] Completed Stage: Post Processing
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
[2022-11-01 01:40:53,268] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:40:54,620] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:41:05,427] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-11-01 01:41:05,428] torch._inductor.compile_fx: [WARNING] Aot Autograd is not safe to run, so falling back to eager
[2022-11-01 01:41:10,350] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:41:10,604] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:41:10,854] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[W kineto_shim.cpp:330] Profiler is not initialized: skipping step() invocation
STAGE:2022-11-01 01:41:15 6573:6573 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-11-01 01:41:16 6573:6573 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-11-01 01:41:18 6573:6573 output_json.cpp:417] Completed Stage: Post Processing
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
[2022-11-01 01:41:41,179] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:41:42,543] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:42:02,154] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:42:02,406] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-11-01 01:42:02,656] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
[W kineto_shim.cpp:330] Profiler is not initialized: skipping step() invocation
STAGE:2022-11-01 01:42:08 8741:8741 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2022-11-01 01:42:08 8741:8741 ActivityProfilerController.cpp:300] Completed Stage: Collection
STAGE:2022-11-01 01:42:11 8741:8741 output_json.cpp:417] Completed Stage: Post Processing
submitit INFO (2022-11-01 01:38:46,566) - Starting with JobEnvironment(job_id=74983, hostname=a100-st-p4d24xlarge-34, local_rank=0(8), node=0(2), global_rank=0(16))
submitit INFO (2022-11-01 01:38:46,567) - Loading pickle: /fsx/users/dberard/scratch-local/bench-fast/benchmark/logs_oct31/74983_submitted.pkl
This is node 0
run_once
Process group: 16 tasks, rank: 0
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO Bootstrap : Using ens32:10.200.87.93<0>
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO cudaDriverVersion 11060
NCCL version 2.14.3+cuda11.6
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/OFI Selected Provider is efa
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Using network AWS Libfabric
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 0 'rdmap16s27'
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 1 'rdmap32s27'
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 2 'rdmap144s27'
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 3 'rdmap160s27'
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Setting affinity for GPU 0 to 1f0000,0000001f
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002dd0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 0 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e10
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 1 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e50
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 2 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e90
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 4 from local rank 0, transport 2
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002ed0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 5 from local rank 0, transport 2
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f10
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f50
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 6 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 7 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f90
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 8 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002fd0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 9 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003010
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003050
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 10 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003090
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 11 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40030d0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 12 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003110
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 13 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003150
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 14 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003190
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 15 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connected all rings
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40031d0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 16 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 17 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003210
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003250
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 18 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003290
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 19 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40032d0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 20 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003310
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 21 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003350
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 22 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003390
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 23 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40033d0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 24 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003410
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 25 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003450
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 26 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003490
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 27 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 28 from local rank 0, transport 2
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40034d0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 29 from local rank 0, transport 2
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003510
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 30 from local rank 0, transport 2
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003550
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 31 from local rank 0, transport 2
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003590
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40035d0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 32 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003610
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 33 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003650
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 34 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003690
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 35 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40036d0
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 36 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003710
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 37 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003750
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 38 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003790
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 39 from local rank 0, transport 0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connected all trees
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NCCL_ALGO set by environment to tree
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 40 from local rank 0, transport 2
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 41 from local rank 1, transport 2
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 42 from local rank 7, transport 2
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 43 from local rank 6, transport 2
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 44 from local rank 5, transport 2
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 45 from local rank 3, transport 2
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 46 from local rank 2, transport 2
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40037d0
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 2 -> connection 0x7fb38c003850
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 4 -> connection 0x7f9660003990
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 6 -> connection 0x7f2b10003850
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 47 from local rank 4, transport 2
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO comm 0x564ad8428c10 rank 0 nranks 16 cudaDev 0 busId 101c0 - Init COMPLETE
result {'latency_median': 66.50521850585938, 'latency_stdev': 0.1567804677881775}
exit code: 0 and result: {'nodes': 2, 'model_name': 'torchbenchmark.models.resnet50.Model', 'backend': 'eager', 'has_breaks': False, 'result': {'latency_median': 66.50521850585938, 'latency_stdev': 0.1567804677881775}}
<RESULT>{"nodes": 2, "model_name": "torchbenchmark.models.resnet50.Model", "backend": "eager", "has_breaks": false, "result": {"latency_median": 66.50521850585938, "latency_stdev": 0.1567804677881775}}</RESULT>
run_once
Process group: 16 tasks, rank: 0
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO Bootstrap : Using ens32:10.200.87.93<0>
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO cudaDriverVersion 11060
NCCL version 2.14.3+cuda11.6
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/OFI Selected Provider is efa
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Using network AWS Libfabric
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 0 'rdmap16s27'
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 1 'rdmap32s27'
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 2 'rdmap144s27'
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 3 'rdmap160s27'
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Setting affinity for GPU 0 to 1f0000,0000001f
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002dd0
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 0 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e10
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 1 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e50
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 2 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e90
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 4 from local rank 0, transport 2
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002ed0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 5 from local rank 0, transport 2
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f10
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f50
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 6 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 7 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f90
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 8 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002fd0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 9 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003010
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 10 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003050
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 11 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003090
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 12 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0030d0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003110
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 13 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003150
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 14 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003190
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 15 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connected all rings
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 16 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0031d0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 17 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003210
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 18 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003250
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003290
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 19 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0032d0
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 20 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003310
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 21 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003350
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 22 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003390
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 23 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0033d0
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 24 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003410
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 25 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003450
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 26 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003490
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 27 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 28 from local rank 0, transport 2
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0034d0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 29 from local rank 0, transport 2
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003510
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 30 from local rank 0, transport 2
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003550
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 31 from local rank 0, transport 2
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003590
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0035d0
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 32 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003610
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 33 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003650
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 34 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003690
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 35 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0036d0
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 36 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003710
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 37 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 38 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003750
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003790
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 39 from local rank 0, transport 0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connected all trees
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NCCL_ALGO set by environment to tree
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 40 from local rank 0, transport 2
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 41 from local rank 1, transport 2
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 42 from local rank 7, transport 2
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 43 from local rank 6, transport 2
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 44 from local rank 5, transport 2
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 45 from local rank 3, transport 2
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 46 from local rank 2, transport 2
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0037d0
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 2 -> connection 0x7fb394003850
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 4 -> connection 0x7f9668003990
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 6 -> connection 0x7f2b0c003850
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 47 from local rank 4, transport 2
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO comm 0x564ad8041870 rank 0 nranks 16 cudaDev 0 busId 101c0 - Init COMPLETE
result {'latency_median': 85.98988723754883, 'latency_stdev': 1.3242423088486284}
exit code: 0 and result: {'nodes': 2, 'model_name': 'torchbenchmark.models.resnet50.Model', 'backend': 'torchdynamo_aot_eager', 'has_breaks': True, 'result': {'latency_median': 85.98988723754883, 'latency_stdev': 1.3242423088486284}}
<RESULT>{"nodes": 2, "model_name": "torchbenchmark.models.resnet50.Model", "backend": "torchdynamo_aot_eager", "has_breaks": true, "result": {"latency_median": 85.98988723754883, "latency_stdev": 1.3242423088486284}}</RESULT>
run_once
Process group: 16 tasks, rank: 0
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO Bootstrap : Using ens32:10.200.87.93<0>
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO cudaDriverVersion 11060
NCCL version 2.14.3+cuda11.6
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/OFI Selected Provider is efa
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Using network AWS Libfabric
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 0 'rdmap16s27'
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 1 'rdmap32s27'
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 2 'rdmap144s27'
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 3 'rdmap160s27'
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Setting affinity for GPU 0 to 1f0000,0000001f
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002dd0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 0 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e10
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 1 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e50
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 2 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e90
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 4 from local rank 0, transport 2
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002ed0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 5 from local rank 0, transport 2
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f10
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f50
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 6 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 7 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f90
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 8 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002fd0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 9 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003010
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003050
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 10 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003090
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 11 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40030d0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 12 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003110
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 13 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003150
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 14 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003190
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 15 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connected all rings
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40031d0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 16 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003210
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 17 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 18 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003250
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003290
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 19 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40032d0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 20 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003310
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 21 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003350
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 22 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003390
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 23 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40033d0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 24 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003410
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 25 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003450
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 26 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003490
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 27 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 28 from local rank 0, transport 2
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40034d0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 29 from local rank 0, transport 2
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003510
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 30 from local rank 0, transport 2
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003550
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 31 from local rank 0, transport 2
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003590
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40035d0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 32 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003610
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 33 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003650
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 34 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003690
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 35 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40036d0
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 36 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003710
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 37 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003750
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 38 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003790
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 39 from local rank 0, transport 0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connected all trees
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NCCL_ALGO set by environment to tree
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 40 from local rank 0, transport 2
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 41 from local rank 1, transport 2
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 42 from local rank 6, transport 2
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 43 from local rank 5, transport 2
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 44 from local rank 3, transport 2
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 45 from local rank 2, transport 2
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40037d0
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 2 -> connection 0x7fb38c003810
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 4 -> connection 0x7f9660003950
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 6 -> connection 0x7f2b10003810
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 46 from local rank 7, transport 2
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 47 from local rank 4, transport 2
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO comm 0x564ad7c98a50 rank 0 nranks 16 cudaDev 0 busId 101c0 - Init COMPLETE
result {'latency_median': 85.90284729003906, 'latency_stdev': 1.3287940048697573}
exit code: 0 and result: {'nodes': 2, 'model_name': 'torchbenchmark.models.resnet50.Model', 'backend': 'torchdynamo_aot_eager', 'has_breaks': False, 'result': {'latency_median': 85.90284729003906, 'latency_stdev': 1.3287940048697573}}
<RESULT>{"nodes": 2, "model_name": "torchbenchmark.models.resnet50.Model", "backend": "torchdynamo_aot_eager", "has_breaks": false, "result": {"latency_median": 85.90284729003906, "latency_stdev": 1.3287940048697573}}</RESULT>
run_once
Process group: 16 tasks, rank: 0
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO Bootstrap : Using ens32:10.200.87.93<0>
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO cudaDriverVersion 11060
NCCL version 2.14.3+cuda11.6
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/OFI Selected Provider is efa
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Using network AWS Libfabric
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 0 'rdmap16s27'
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 1 'rdmap32s27'
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 2 'rdmap144s27'
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 3 'rdmap160s27'
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Setting affinity for GPU 0 to 1f0000,0000001f
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002dd0
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 0 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e10
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 1 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 2 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e50
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e90
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 4 from local rank 0, transport 2
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002ed0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f10
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 5 from local rank 0, transport 2
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 6 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f50
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 7 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f90
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 8 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002fd0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 9 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003010
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003050
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 10 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003090
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 11 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0030d0
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 12 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003110
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 13 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003150
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 14 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003190
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 15 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connected all rings
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0031d0
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 16 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 17 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003210
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003250
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 18 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003290
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 19 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0032d0
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 20 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003310
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 21 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003350
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 22 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003390
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 23 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0033d0
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 24 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003410
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 25 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003450
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 26 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003490
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 27 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 28 from local rank 0, transport 2
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0034d0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 29 from local rank 0, transport 2
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003510
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 30 from local rank 0, transport 2
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003550
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 31 from local rank 0, transport 2
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003590
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0035d0
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 32 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003610
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 33 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003650
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 34 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003690
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 35 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0036d0
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 36 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003710
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 37 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003750
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 38 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003790
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 39 from local rank 0, transport 0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connected all trees
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NCCL_ALGO set by environment to tree
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 40 from local rank 0, transport 2
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 41 from local rank 7, transport 2
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 42 from local rank 1, transport 2
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 43 from local rank 6, transport 2
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 44 from local rank 5, transport 2
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 45 from local rank 3, transport 2
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 46 from local rank 2, transport 2
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0037d0
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 2 -> connection 0x7fb390003850
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 4 -> connection 0x7f965c003990
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 6 -> connection 0x7f2b14003850
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 47 from local rank 4, transport 2
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO comm 0x564ad855afe0 rank 0 nranks 16 cudaDev 0 busId 101c0 - Init COMPLETE
result {'latency_median': 85.34681701660156, 'latency_stdev': 1.8793551451605035}
exit code: 0 and result: {'nodes': 2, 'model_name': 'torchbenchmark.models.resnet50.Model', 'backend': 'torchdynamo_inductor', 'has_breaks': True, 'result': {'latency_median': 85.34681701660156, 'latency_stdev': 1.8793551451605035}}
<RESULT>{"nodes": 2, "model_name": "torchbenchmark.models.resnet50.Model", "backend": "torchdynamo_inductor", "has_breaks": true, "result": {"latency_median": 85.34681701660156, "latency_stdev": 1.8793551451605035}}</RESULT>
run_once
Process group: 16 tasks, rank: 0
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO Bootstrap : Using ens32:10.200.87.93<0>
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO cudaDriverVersion 11060
NCCL version 2.14.3+cuda11.6
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/OFI Selected Provider is efa
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Using network AWS Libfabric
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 0 'rdmap16s27'
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 1 'rdmap32s27'
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 2 'rdmap144s27'
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 3 'rdmap160s27'
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Setting affinity for GPU 0 to 1f0000,0000001f
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002dd0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 0 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e10
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 1 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e50
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 2 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e90
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 4 from local rank 0, transport 2
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002ed0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 5 from local rank 0, transport 2
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f10
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f50
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 6 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 7 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f90
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 8 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002fd0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 9 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003010
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 10 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003050
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003090
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 11 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0030d0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 12 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003110
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 13 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003150
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 14 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003190
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 15 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connected all rings
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0031d0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 16 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003210
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 17 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003250
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 18 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003290
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 19 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0032d0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 20 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003310
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 21 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003350
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 22 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003390
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 23 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0033d0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 24 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003410
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 25 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003450
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 26 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003490
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 27 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 28 from local rank 0, transport 2
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0034d0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 29 from local rank 0, transport 2
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003510
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 30 from local rank 0, transport 2
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003550
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 31 from local rank 0, transport 2
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003590
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0035d0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 32 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003610
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 33 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003650
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 34 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003690
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 35 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0036d0
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 36 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003710
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 37 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003750
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 38 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003790
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 39 from local rank 0, transport 0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connected all trees
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NCCL_ALGO set by environment to tree
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 40 from local rank 0, transport 2
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 41 from local rank 1, transport 2
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 42 from local rank 7, transport 2
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 43 from local rank 6, transport 2
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 44 from local rank 5, transport 2
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 45 from local rank 2, transport 2
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 46 from local rank 3, transport 2
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0037d0
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 2 -> connection 0x7fb390003850
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 4 -> connection 0x7f9654003990
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 6 -> connection 0x7f2b14003850
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 47 from local rank 4, transport 2
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO comm 0x564ad7d53170 rank 0 nranks 16 cudaDev 0 busId 101c0 - Init COMPLETE
result {'latency_median': 83.9736328125, 'latency_stdev': 1.1737800621528658}
exit code: 0 and result: {'nodes': 2, 'model_name': 'torchbenchmark.models.resnet50.Model', 'backend': 'torchdynamo_inductor', 'has_breaks': False, 'result': {'latency_median': 83.9736328125, 'latency_stdev': 1.1737800621528658}}
<RESULT>{"nodes": 2, "model_name": "torchbenchmark.models.resnet50.Model", "backend": "torchdynamo_inductor", "has_breaks": false, "result": {"latency_median": 83.9736328125, "latency_stdev": 1.1737800621528658}}</RESULT>
submitit INFO (2022-11-01 01:42:15,115) - Job completed successfully
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment