Last active
November 1, 2022 03:21
-
-
Save davidberard98/755bc9c396a86de637893e14e8ec0c3b to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. | |
warnings.warn( | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights. | |
warnings.warn(msg) | |
[W kineto_shim.cpp:330] Profiler is not initialized: skipping step() invocation | |
STAGE:2022-11-01 01:39:13 3461:3461 ActivityProfilerController.cpp:294] Completed Stage: Warm Up | |
STAGE:2022-11-01 01:39:14 3461:3461 ActivityProfilerController.cpp:300] Completed Stage: Collection | |
STAGE:2022-11-01 01:39:16 3461:3461 output_json.cpp:417] Completed Stage: Post Processing | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. | |
warnings.warn( | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights. | |
warnings.warn(msg) | |
[2022-11-01 01:39:38,741] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:39:40,064] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:39:43,769] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation | |
[2022-11-01 01:39:45,329] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:39:45,577] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:39:45,824] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[W kineto_shim.cpp:330] Profiler is not initialized: skipping step() invocation | |
STAGE:2022-11-01 01:39:50 4417:4417 ActivityProfilerController.cpp:294] Completed Stage: Warm Up | |
STAGE:2022-11-01 01:39:51 4417:4417 ActivityProfilerController.cpp:300] Completed Stage: Collection | |
STAGE:2022-11-01 01:39:53 4417:4417 output_json.cpp:417] Completed Stage: Post Processing | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. | |
warnings.warn( | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights. | |
warnings.warn(msg) | |
[2022-11-01 01:40:13,727] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:40:15,047] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:40:23,902] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:40:24,284] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:40:24,534] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[W kineto_shim.cpp:330] Profiler is not initialized: skipping step() invocation | |
STAGE:2022-11-01 01:40:29 5591:5591 ActivityProfilerController.cpp:294] Completed Stage: Warm Up | |
STAGE:2022-11-01 01:40:29 5591:5591 ActivityProfilerController.cpp:300] Completed Stage: Collection | |
STAGE:2022-11-01 01:40:32 5591:5591 output_json.cpp:417] Completed Stage: Post Processing | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. | |
warnings.warn( | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights. | |
warnings.warn(msg) | |
[2022-11-01 01:40:53,268] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:40:54,620] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:41:05,427] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation | |
[2022-11-01 01:41:05,428] torch._inductor.compile_fx: [WARNING] Aot Autograd is not safe to run, so falling back to eager | |
[2022-11-01 01:41:10,350] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:41:10,604] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:41:10,854] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[W kineto_shim.cpp:330] Profiler is not initialized: skipping step() invocation | |
STAGE:2022-11-01 01:41:15 6573:6573 ActivityProfilerController.cpp:294] Completed Stage: Warm Up | |
STAGE:2022-11-01 01:41:16 6573:6573 ActivityProfilerController.cpp:300] Completed Stage: Collection | |
STAGE:2022-11-01 01:41:18 6573:6573 output_json.cpp:417] Completed Stage: Post Processing | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. | |
warnings.warn( | |
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights. | |
warnings.warn(msg) | |
[2022-11-01 01:41:41,179] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:41:42,543] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:42:02,154] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:42:02,406] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[2022-11-01 01:42:02,656] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored | |
[W kineto_shim.cpp:330] Profiler is not initialized: skipping step() invocation | |
STAGE:2022-11-01 01:42:08 8741:8741 ActivityProfilerController.cpp:294] Completed Stage: Warm Up | |
STAGE:2022-11-01 01:42:08 8741:8741 ActivityProfilerController.cpp:300] Completed Stage: Collection | |
STAGE:2022-11-01 01:42:11 8741:8741 output_json.cpp:417] Completed Stage: Post Processing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
submitit INFO (2022-11-01 01:38:46,566) - Starting with JobEnvironment(job_id=74983, hostname=a100-st-p4d24xlarge-34, local_rank=0(8), node=0(2), global_rank=0(16)) | |
submitit INFO (2022-11-01 01:38:46,567) - Loading pickle: /fsx/users/dberard/scratch-local/bench-fast/benchmark/logs_oct31/74983_submitted.pkl | |
This is node 0 | |
run_once | |
Process group: 16 tasks, rank: 0 | |
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens | |
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens | |
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO Bootstrap : Using ens32:10.200.87.93<0> | |
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol. | |
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5). | |
a100-st-p4d24xlarge-34:3461:3461 [0] NCCL INFO cudaDriverVersion 11060 | |
NCCL version 2.14.3+cuda11.6 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/OFI Selected Provider is efa | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Using network AWS Libfabric | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 0 'rdmap16s27' | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 1 'rdmap32s27' | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 2 'rdmap144s27' | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 3 'rdmap160s27' | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Setting affinity for GPU 0 to 1f0000,0000001f | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002dd0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 0 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e10 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 1 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e50 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 2 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e90 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 4 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002ed0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 5 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f10 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f50 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 6 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 7 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f90 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 8 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002fd0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 9 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003010 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003050 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 10 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003090 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 11 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40030d0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 12 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003110 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 13 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003150 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 14 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003190 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 15 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connected all rings | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40031d0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 16 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 17 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003210 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003250 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 18 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003290 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 19 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40032d0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 20 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003310 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 21 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003350 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 22 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003390 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 23 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40033d0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 24 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003410 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 25 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003450 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 26 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003490 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 27 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 28 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40034d0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 29 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003510 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 30 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003550 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 31 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003590 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40035d0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 32 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003610 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 33 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003650 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 34 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003690 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 35 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40036d0 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 36 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003710 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 37 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003750 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 38 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003790 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy recv connection 39 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connected all trees | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO NCCL_ALGO set by environment to tree | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 40 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 41 from local rank 1, transport 2 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 42 from local rank 7, transport 2 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 43 from local rank 6, transport 2 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 44 from local rank 5, transport 2 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 45 from local rank 3, transport 2 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 46 from local rank 2, transport 2 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40037d0 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 2 -> connection 0x7fb38c003850 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 4 -> connection 0x7f9660003990 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO Connection to proxy localRank 6 -> connection 0x7f2b10003850 | |
a100-st-p4d24xlarge-34:3461:4060 [0] NCCL INFO New proxy send connection 47 from local rank 4, transport 2 | |
a100-st-p4d24xlarge-34:3461:4008 [0] NCCL INFO comm 0x564ad8428c10 rank 0 nranks 16 cudaDev 0 busId 101c0 - Init COMPLETE | |
result {'latency_median': 66.50521850585938, 'latency_stdev': 0.1567804677881775} | |
exit code: 0 and result: {'nodes': 2, 'model_name': 'torchbenchmark.models.resnet50.Model', 'backend': 'eager', 'has_breaks': False, 'result': {'latency_median': 66.50521850585938, 'latency_stdev': 0.1567804677881775}} | |
<RESULT>{"nodes": 2, "model_name": "torchbenchmark.models.resnet50.Model", "backend": "eager", "has_breaks": false, "result": {"latency_median": 66.50521850585938, "latency_stdev": 0.1567804677881775}}</RESULT> | |
run_once | |
Process group: 16 tasks, rank: 0 | |
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens | |
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens | |
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO Bootstrap : Using ens32:10.200.87.93<0> | |
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol. | |
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5). | |
a100-st-p4d24xlarge-34:4417:4417 [0] NCCL INFO cudaDriverVersion 11060 | |
NCCL version 2.14.3+cuda11.6 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/OFI Selected Provider is efa | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Using network AWS Libfabric | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 0 'rdmap16s27' | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 1 'rdmap32s27' | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 2 'rdmap144s27' | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 3 'rdmap160s27' | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Setting affinity for GPU 0 to 1f0000,0000001f | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002dd0 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 0 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e10 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 1 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e50 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 2 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e90 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 4 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002ed0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 5 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f10 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f50 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 6 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 7 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f90 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 8 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002fd0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 9 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003010 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 10 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003050 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 11 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003090 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 12 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0030d0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003110 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 13 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003150 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 14 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003190 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 15 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connected all rings | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 16 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0031d0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 17 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003210 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 18 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003250 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003290 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 19 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0032d0 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 20 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003310 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 21 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003350 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 22 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003390 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 23 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0033d0 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 24 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003410 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 25 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003450 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 26 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003490 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 27 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 28 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0034d0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 29 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003510 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 30 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003550 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 31 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003590 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0035d0 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 32 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003610 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 33 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003650 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 34 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003690 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 35 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0036d0 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 36 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003710 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 37 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 38 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003750 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003790 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy recv connection 39 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connected all trees | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO NCCL_ALGO set by environment to tree | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 40 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 41 from local rank 1, transport 2 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 42 from local rank 7, transport 2 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 43 from local rank 6, transport 2 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 44 from local rank 5, transport 2 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 45 from local rank 3, transport 2 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 46 from local rank 2, transport 2 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0037d0 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 2 -> connection 0x7fb394003850 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 4 -> connection 0x7f9668003990 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO Connection to proxy localRank 6 -> connection 0x7f2b0c003850 | |
a100-st-p4d24xlarge-34:4417:5020 [0] NCCL INFO New proxy send connection 47 from local rank 4, transport 2 | |
a100-st-p4d24xlarge-34:4417:4946 [0] NCCL INFO comm 0x564ad8041870 rank 0 nranks 16 cudaDev 0 busId 101c0 - Init COMPLETE | |
result {'latency_median': 85.98988723754883, 'latency_stdev': 1.3242423088486284} | |
exit code: 0 and result: {'nodes': 2, 'model_name': 'torchbenchmark.models.resnet50.Model', 'backend': 'torchdynamo_aot_eager', 'has_breaks': True, 'result': {'latency_median': 85.98988723754883, 'latency_stdev': 1.3242423088486284}} | |
<RESULT>{"nodes": 2, "model_name": "torchbenchmark.models.resnet50.Model", "backend": "torchdynamo_aot_eager", "has_breaks": true, "result": {"latency_median": 85.98988723754883, "latency_stdev": 1.3242423088486284}}</RESULT> | |
run_once | |
Process group: 16 tasks, rank: 0 | |
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens | |
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens | |
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO Bootstrap : Using ens32:10.200.87.93<0> | |
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol. | |
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5). | |
a100-st-p4d24xlarge-34:5591:5591 [0] NCCL INFO cudaDriverVersion 11060 | |
NCCL version 2.14.3+cuda11.6 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/OFI Selected Provider is efa | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Using network AWS Libfabric | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 0 'rdmap16s27' | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 1 'rdmap32s27' | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 2 'rdmap144s27' | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 3 'rdmap160s27' | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Setting affinity for GPU 0 to 1f0000,0000001f | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002dd0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 0 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e10 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 1 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e50 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 2 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002e90 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 4 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002ed0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 5 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f10 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f50 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 6 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 7 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002f90 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 8 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4002fd0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 9 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003010 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003050 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 10 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003090 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 11 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40030d0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 12 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003110 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 13 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003150 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 14 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003190 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 15 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connected all rings | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40031d0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 16 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003210 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 17 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 18 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003250 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003290 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 19 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40032d0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 20 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003310 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 21 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003350 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 22 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003390 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 23 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40033d0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 24 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003410 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 25 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003450 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 26 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003490 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 27 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 28 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40034d0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 29 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003510 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 30 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003550 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 31 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003590 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40035d0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 32 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003610 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 33 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003650 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 34 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003690 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 35 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40036d0 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 36 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003710 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 37 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003750 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 38 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e4003790 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy recv connection 39 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connected all trees | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO NCCL_ALGO set by environment to tree | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 40 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 41 from local rank 1, transport 2 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 42 from local rank 6, transport 2 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 43 from local rank 5, transport 2 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 44 from local rank 3, transport 2 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 45 from local rank 2, transport 2 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19e40037d0 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 2 -> connection 0x7fb38c003810 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 4 -> connection 0x7f9660003950 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO Connection to proxy localRank 6 -> connection 0x7f2b10003810 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 46 from local rank 7, transport 2 | |
a100-st-p4d24xlarge-34:5591:6160 [0] NCCL INFO New proxy send connection 47 from local rank 4, transport 2 | |
a100-st-p4d24xlarge-34:5591:6112 [0] NCCL INFO comm 0x564ad7c98a50 rank 0 nranks 16 cudaDev 0 busId 101c0 - Init COMPLETE | |
result {'latency_median': 85.90284729003906, 'latency_stdev': 1.3287940048697573} | |
exit code: 0 and result: {'nodes': 2, 'model_name': 'torchbenchmark.models.resnet50.Model', 'backend': 'torchdynamo_aot_eager', 'has_breaks': False, 'result': {'latency_median': 85.90284729003906, 'latency_stdev': 1.3287940048697573}} | |
<RESULT>{"nodes": 2, "model_name": "torchbenchmark.models.resnet50.Model", "backend": "torchdynamo_aot_eager", "has_breaks": false, "result": {"latency_median": 85.90284729003906, "latency_stdev": 1.3287940048697573}}</RESULT> | |
run_once | |
Process group: 16 tasks, rank: 0 | |
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens | |
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens | |
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO Bootstrap : Using ens32:10.200.87.93<0> | |
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol. | |
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5). | |
a100-st-p4d24xlarge-34:6573:6573 [0] NCCL INFO cudaDriverVersion 11060 | |
NCCL version 2.14.3+cuda11.6 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/OFI Selected Provider is efa | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Using network AWS Libfabric | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 0 'rdmap16s27' | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 1 'rdmap32s27' | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 2 'rdmap144s27' | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 3 'rdmap160s27' | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Setting affinity for GPU 0 to 1f0000,0000001f | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002dd0 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 0 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e10 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 1 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 2 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e50 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e90 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 4 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002ed0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f10 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 5 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 6 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f50 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 7 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f90 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 8 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002fd0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 9 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003010 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003050 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 10 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003090 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 11 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0030d0 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 12 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003110 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 13 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003150 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 14 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003190 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 15 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connected all rings | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0031d0 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 16 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 17 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003210 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003250 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 18 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003290 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 19 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0032d0 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 20 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003310 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 21 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003350 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 22 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003390 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 23 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0033d0 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 24 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003410 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 25 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003450 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 26 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003490 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 27 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 28 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0034d0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 29 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003510 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 30 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003550 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 31 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003590 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0035d0 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 32 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003610 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 33 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003650 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 34 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003690 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 35 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0036d0 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 36 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003710 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 37 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003750 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 38 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003790 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy recv connection 39 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connected all trees | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO NCCL_ALGO set by environment to tree | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 40 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 41 from local rank 7, transport 2 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 42 from local rank 1, transport 2 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 43 from local rank 6, transport 2 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 44 from local rank 5, transport 2 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 45 from local rank 3, transport 2 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 46 from local rank 2, transport 2 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0037d0 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 2 -> connection 0x7fb390003850 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 4 -> connection 0x7f965c003990 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO Connection to proxy localRank 6 -> connection 0x7f2b14003850 | |
a100-st-p4d24xlarge-34:6573:7867 [0] NCCL INFO New proxy send connection 47 from local rank 4, transport 2 | |
a100-st-p4d24xlarge-34:6573:7824 [0] NCCL INFO comm 0x564ad855afe0 rank 0 nranks 16 cudaDev 0 busId 101c0 - Init COMPLETE | |
result {'latency_median': 85.34681701660156, 'latency_stdev': 1.8793551451605035} | |
exit code: 0 and result: {'nodes': 2, 'model_name': 'torchbenchmark.models.resnet50.Model', 'backend': 'torchdynamo_inductor', 'has_breaks': True, 'result': {'latency_median': 85.34681701660156, 'latency_stdev': 1.8793551451605035}} | |
<RESULT>{"nodes": 2, "model_name": "torchbenchmark.models.resnet50.Model", "backend": "torchdynamo_inductor", "has_breaks": true, "result": {"latency_median": 85.34681701660156, "latency_stdev": 1.8793551451605035}}</RESULT> | |
run_once | |
Process group: 16 tasks, rank: 0 | |
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens | |
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens | |
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO Bootstrap : Using ens32:10.200.87.93<0> | |
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol. | |
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5). | |
a100-st-p4d24xlarge-34:8741:8741 [0] NCCL INFO cudaDriverVersion 11060 | |
NCCL version 2.14.3+cuda11.6 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/OFI Running on p4d.24xlarge platform, Setting NCCL_TOPO_FILE environment variable to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/OFI Selected Provider is efa | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Using network AWS Libfabric | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/share/aws-ofi-nccl/xml/p4d-24xl-topo.xml | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 0 'rdmap16s27' | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 1 'rdmap32s27' | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 2 'rdmap144s27' | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NET/AWS Libfabric : GPU Direct RDMA Enabled for HCA 3 'rdmap160s27' | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 1 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 2 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 201d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 901d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01c0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU a01d0 / HCA 3 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Setting affinity for GPU 0 to 1f0000,0000001f | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 01/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 02/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 03/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/08 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 05/08 : 0 3 10 15 14 13 12 9 8 11 2 7 6 5 4 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 06/08 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 07/08 : 0 5 4 7 14 11 10 9 8 13 12 15 6 3 2 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] 1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->8 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002dd0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 0 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e10 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 1 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e50 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 2 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 3 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002e90 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 4 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002ed0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 5 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f10 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/0 : 9[101d0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f50 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 6 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 7 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002f90 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 8 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec002fd0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 9 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003010 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 10 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003050 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003090 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 11 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0030d0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 12 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003110 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 13 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003150 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 14 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003190 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 15 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connected all rings | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0031d0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 16 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003210 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 17 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 02/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003250 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 18 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003290 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 19 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0032d0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 20 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003310 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 21 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 06/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003350 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 22 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003390 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 23 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0033d0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 24 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 03/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003410 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 25 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003450 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 26 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 07/0 : 0[101c0] -> 7[a01d0] via P2P/IPC/read | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003490 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 27 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 28 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0034d0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 29 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003510 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/0 : 8[101c0] -> 0[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 30 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003550 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 0 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 31 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003590 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 8[101c0] [send] via NET/AWS Libfabric/0/GDRDMA | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0035d0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 32 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003610 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 33 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003650 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 34 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003690 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 35 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0036d0 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 36 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003710 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 37 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003750 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 38 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec003790 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy recv connection 39 from local rank 0, transport 0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connected all trees | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO NCCL_ALGO set by environment to tree | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 40 from local rank 0, transport 2 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 41 from local rank 1, transport 2 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 42 from local rank 7, transport 2 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 43 from local rank 6, transport 2 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 44 from local rank 5, transport 2 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 45 from local rank 2, transport 2 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 46 from local rank 3, transport 2 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 0 -> connection 0x7f19ec0037d0 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 1 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 2 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO GPU Direct RDMA Enabled for GPU 101c0 / HCA 3 (distance 3 <= 4), read 1 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 2 -> connection 0x7fb390003850 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 4 -> connection 0x7f9654003990 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO Connection to proxy localRank 6 -> connection 0x7f2b14003850 | |
a100-st-p4d24xlarge-34:8741:9870 [0] NCCL INFO New proxy send connection 47 from local rank 4, transport 2 | |
a100-st-p4d24xlarge-34:8741:9822 [0] NCCL INFO comm 0x564ad7d53170 rank 0 nranks 16 cudaDev 0 busId 101c0 - Init COMPLETE | |
result {'latency_median': 83.9736328125, 'latency_stdev': 1.1737800621528658} | |
exit code: 0 and result: {'nodes': 2, 'model_name': 'torchbenchmark.models.resnet50.Model', 'backend': 'torchdynamo_inductor', 'has_breaks': False, 'result': {'latency_median': 83.9736328125, 'latency_stdev': 1.1737800621528658}} | |
<RESULT>{"nodes": 2, "model_name": "torchbenchmark.models.resnet50.Model", "backend": "torchdynamo_inductor", "has_breaks": false, "result": {"latency_median": 83.9736328125, "latency_stdev": 1.1737800621528658}}</RESULT> | |
submitit INFO (2022-11-01 01:42:15,115) - Job completed successfully |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment