Created
August 26, 2022 23:11
-
-
Save neel04/22052c61512657aec4b364888ed74a95 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gpu-st-p4d-24xlarge-44:28065:28065 [0] NCCL INFO Bootstrap : Using eth0:172.31.231.78<0> | |
gpu-st-p4d-24xlarge-44:28065:28065 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28065:28065 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28065:28065 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-44:28065:28065 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-44:28065:28065 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-44:28065:28065 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-44:28065:28065 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.78<0> [1]eth1:172.31.225.87<0> [2]eth2:172.31.235.21<0> [3]eth3:172.31.235.87<0> | |
gpu-st-p4d-24xlarge-44:28065:28065 [0] NCCL INFO Using network Socket | |
NCCL version 2.12.12+cuda11.7 | |
gpu-st-p4d-24xlarge-49:25778:25778 [0] NCCL INFO Bootstrap : Using eth0:172.31.237.132<0> | |
gpu-st-p4d-24xlarge-49:25778:25778 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25778:25778 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25778:25778 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-49:25778:25778 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-49:25778:25778 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25778:25778 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25778:25778 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.132<0> [1]eth1:172.31.227.76<0> [2]eth2:172.31.232.61<0> [3]eth3:172.31.230.241<0> | |
gpu-st-p4d-24xlarge-49:25778:25778 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-49:25781:25781 [3] NCCL INFO Bootstrap : Using eth0:172.31.237.132<0> | |
gpu-st-p4d-24xlarge-49:25781:25781 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25781:25781 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25781:25781 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-49:25781:25781 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-49:25781:25781 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25781:25781 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25781:25781 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.132<0> [1]eth1:172.31.227.76<0> [2]eth2:172.31.232.61<0> [3]eth3:172.31.230.241<0> | |
gpu-st-p4d-24xlarge-49:25781:25781 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-49:25777:25777 [7] NCCL INFO Bootstrap : Using eth0:172.31.237.132<0> | |
gpu-st-p4d-24xlarge-49:25777:25777 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25777:25777 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25777:25777 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-49:25777:25777 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-49:25777:25777 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25777:25777 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25777:25777 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.132<0> [1]eth1:172.31.227.76<0> [2]eth2:172.31.232.61<0> [3]eth3:172.31.230.241<0> | |
gpu-st-p4d-24xlarge-49:25777:25777 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-44:28068:28068 [3] NCCL INFO Bootstrap : Using eth0:172.31.231.78<0> | |
gpu-st-p4d-24xlarge-44:28068:28068 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28068:28068 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28068:28068 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-44:28068:28068 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-44:28068:28068 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-44:28068:28068 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-44:28068:28068 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.78<0> [1]eth1:172.31.225.87<0> [2]eth2:172.31.235.21<0> [3]eth3:172.31.235.87<0> | |
gpu-st-p4d-24xlarge-44:28068:28068 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-49:25776:25776 [6] NCCL INFO Bootstrap : Using eth0:172.31.237.132<0> | |
gpu-st-p4d-24xlarge-49:25776:25776 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25776:25776 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25776:25776 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-49:25776:25776 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-49:25776:25776 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25776:25776 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25776:25776 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.132<0> [1]eth1:172.31.227.76<0> [2]eth2:172.31.232.61<0> [3]eth3:172.31.230.241<0> | |
gpu-st-p4d-24xlarge-49:25776:25776 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-49:25779:25779 [1] NCCL INFO Bootstrap : Using eth0:172.31.237.132<0> | |
gpu-st-p4d-24xlarge-49:25779:25779 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25779:25779 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25779:25779 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-49:25779:25779 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-49:25779:25779 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25779:25779 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25779:25779 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.132<0> [1]eth1:172.31.227.76<0> [2]eth2:172.31.232.61<0> [3]eth3:172.31.230.241<0> | |
gpu-st-p4d-24xlarge-49:25779:25779 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-49:25780:25780 [2] NCCL INFO Bootstrap : Using eth0:172.31.237.132<0> | |
gpu-st-p4d-24xlarge-49:25780:25780 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25780:25780 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-49:25780:25780 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-49:25780:25780 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-49:25780:25780 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27794:27794 [1] NCCL INFO Bootstrap : Using eth0:172.31.229.205<0> | |
gpu-st-p4d-24xlarge-49:25780:25780 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-49:25780:25780 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.132<0> [1]eth1:172.31.227.76<0> [2]eth2:172.31.232.61<0> [3]eth3:172.31.230.241<0> | |
gpu-st-p4d-24xlarge-49:25780:25780 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-57:27793:27793 [0] NCCL INFO Bootstrap : Using eth0:172.31.229.205<0> | |
gpu-st-p4d-24xlarge-57:27796:27796 [3] NCCL INFO Bootstrap : Using eth0:172.31.229.205<0> | |
gpu-st-p4d-24xlarge-57:27792:27792 [7] NCCL INFO Bootstrap : Using eth0:172.31.229.205<0> | |
gpu-st-p4d-24xlarge-57:27795:27795 [2] NCCL INFO Bootstrap : Using eth0:172.31.229.205<0> | |
gpu-st-p4d-24xlarge-57:27791:27791 [6] NCCL INFO Bootstrap : Using eth0:172.31.229.205<0> | |
gpu-st-p4d-24xlarge-54:29518:29518 [0] NCCL INFO Bootstrap : Using eth0:172.31.229.154<0> | |
gpu-st-p4d-24xlarge-54:29519:29519 [1] NCCL INFO Bootstrap : Using eth0:172.31.229.154<0> | |
gpu-st-p4d-24xlarge-54:29517:29517 [7] NCCL INFO Bootstrap : Using eth0:172.31.229.154<0> | |
gpu-st-p4d-24xlarge-54:29515:29515 [5] NCCL INFO Bootstrap : Using eth0:172.31.229.154<0> | |
gpu-st-p4d-24xlarge-54:29516:29516 [6] NCCL INFO Bootstrap : Using eth0:172.31.229.154<0> | |
gpu-st-p4d-24xlarge-54:29514:29514 [4] NCCL INFO Bootstrap : Using eth0:172.31.229.154<0> | |
gpu-st-p4d-24xlarge-53:26988:26988 [0] NCCL INFO Bootstrap : Using eth0:172.31.227.198<0> | |
gpu-st-p4d-24xlarge-53:26991:26991 [3] NCCL INFO Bootstrap : Using eth0:172.31.227.198<0> | |
gpu-st-p4d-24xlarge-53:26990:26990 [2] NCCL INFO Bootstrap : Using eth0:172.31.227.198<0> | |
gpu-st-p4d-24xlarge-53:26987:26987 [7] NCCL INFO Bootstrap : Using eth0:172.31.227.198<0> | |
gpu-st-p4d-24xlarge-53:26989:26989 [1] NCCL INFO Bootstrap : Using eth0:172.31.227.198<0> | |
gpu-st-p4d-24xlarge-53:26986:26986 [6] NCCL INFO Bootstrap : Using eth0:172.31.227.198<0> | |
gpu-st-p4d-24xlarge-45:30139:30139 [2] NCCL INFO Bootstrap : Using eth0:172.31.235.246<0> | |
gpu-st-p4d-24xlarge-57:27791:27791 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27793:27793 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27796:27796 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27795:27795 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27792:27792 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27791:27791 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27793:27793 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27796:27796 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27795:27795 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27792:27792 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27791:27791 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-57:27793:27793 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-57:27791:27791 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-57:27795:27795 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-57:27796:27796 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-57:27792:27792 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-57:27793:27793 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-57:27795:27795 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-57:27796:27796 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-57:27792:27792 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-45:30137:30137 [0] NCCL INFO Bootstrap : Using eth0:172.31.235.246<0> | |
gpu-st-p4d-24xlarge-45:30135:30135 [6] NCCL INFO Bootstrap : Using eth0:172.31.235.246<0> | |
gpu-st-p4d-24xlarge-57:27794:27794 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27794:27794 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-57:27794:27794 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-57:27794:27794 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-45:30145:30145 [3] NCCL INFO Bootstrap : Using eth0:172.31.235.246<0> | |
gpu-st-p4d-24xlarge-45:30138:30138 [1] NCCL INFO Bootstrap : Using eth0:172.31.235.246<0> | |
gpu-st-p4d-24xlarge-45:30136:30136 [7] NCCL INFO Bootstrap : Using eth0:172.31.235.246<0> | |
gpu-st-p4d-24xlarge-57:27794:27794 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27792:27792 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27796:27796 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27793:27793 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27795:27795 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27791:27791 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27795:27795 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27792:27792 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27794:27794 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27796:27796 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27791:27791 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27793:27793 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-57:27795:27795 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.205<0> [1]eth1:172.31.224.87<0> [2]eth2:172.31.237.20<0> [3]eth3:172.31.226.214<0> | |
gpu-st-p4d-24xlarge-57:27795:27795 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-57:27792:27792 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.205<0> [1]eth1:172.31.224.87<0> [2]eth2:172.31.237.20<0> [3]eth3:172.31.226.214<0> | |
gpu-st-p4d-24xlarge-57:27792:27792 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-57:27794:27794 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.205<0> [1]eth1:172.31.224.87<0> [2]eth2:172.31.237.20<0> [3]eth3:172.31.226.214<0> | |
gpu-st-p4d-24xlarge-57:27794:27794 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-57:27796:27796 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.205<0> [1]eth1:172.31.224.87<0> [2]eth2:172.31.237.20<0> [3]eth3:172.31.226.214<0> | |
gpu-st-p4d-24xlarge-57:27796:27796 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-57:27791:27791 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.205<0> [1]eth1:172.31.224.87<0> [2]eth2:172.31.237.20<0> [3]eth3:172.31.226.214<0> | |
gpu-st-p4d-24xlarge-57:27791:27791 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-57:27793:27793 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.205<0> [1]eth1:172.31.224.87<0> [2]eth2:172.31.237.20<0> [3]eth3:172.31.226.214<0> | |
gpu-st-p4d-24xlarge-57:27793:27793 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-46:30085:30085 [0] NCCL INFO Bootstrap : Using eth0:172.31.226.192<0> | |
gpu-st-p4d-24xlarge-46:30086:30086 [1] NCCL INFO Bootstrap : Using eth0:172.31.226.192<0> | |
gpu-st-p4d-24xlarge-46:30084:30084 [7] NCCL INFO Bootstrap : Using eth0:172.31.226.192<0> | |
gpu-st-p4d-24xlarge-46:30082:30082 [5] NCCL INFO Bootstrap : Using eth0:172.31.226.192<0> | |
gpu-st-p4d-24xlarge-46:30083:30083 [6] NCCL INFO Bootstrap : Using eth0:172.31.226.192<0> | |
gpu-st-p4d-24xlarge-46:30081:30081 [4] NCCL INFO Bootstrap : Using eth0:172.31.226.192<0> | |
gpu-st-p4d-24xlarge-54:29519:29519 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29519:29519 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29519:29519 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-54:29519:29519 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-54:29515:29515 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29515:29515 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29515:29515 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-54:29515:29515 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-54:29514:29514 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29514:29514 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29514:29514 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-54:29514:29514 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-54:29517:29517 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29517:29517 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29517:29517 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-54:29517:29517 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-54:29514:29514 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29515:29515 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29519:29519 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29517:29517 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29515:29515 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29519:29519 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29517:29517 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29514:29514 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29515:29515 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.154<0> [1]eth1:172.31.239.92<0> [2]eth2:172.31.234.27<0> [3]eth3:172.31.229.218<0> | |
gpu-st-p4d-24xlarge-54:29515:29515 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-54:29519:29519 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.154<0> [1]eth1:172.31.239.92<0> [2]eth2:172.31.234.27<0> [3]eth3:172.31.229.218<0> | |
gpu-st-p4d-24xlarge-54:29519:29519 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-54:29517:29517 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.154<0> [1]eth1:172.31.239.92<0> [2]eth2:172.31.234.27<0> [3]eth3:172.31.229.218<0> | |
gpu-st-p4d-24xlarge-54:29514:29514 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.154<0> [1]eth1:172.31.239.92<0> [2]eth2:172.31.234.27<0> [3]eth3:172.31.229.218<0> | |
gpu-st-p4d-24xlarge-54:29517:29517 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-54:29514:29514 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-54:29518:29518 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29516:29516 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29518:29518 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29516:29516 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-54:29518:29518 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-54:29516:29516 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-54:29518:29518 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-54:29516:29516 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-54:29518:29518 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29516:29516 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29518:29518 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29516:29516 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-54:29518:29518 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.154<0> [1]eth1:172.31.239.92<0> [2]eth2:172.31.234.27<0> [3]eth3:172.31.229.218<0> | |
gpu-st-p4d-24xlarge-54:29518:29518 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-54:29516:29516 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.229.154<0> [1]eth1:172.31.239.92<0> [2]eth2:172.31.234.27<0> [3]eth3:172.31.229.218<0> | |
gpu-st-p4d-24xlarge-54:29516:29516 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-44:28066:28066 [1] NCCL INFO Bootstrap : Using eth0:172.31.231.78<0> | |
gpu-st-p4d-24xlarge-44:28066:28066 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28066:28066 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28066:28066 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-44:28066:28066 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-44:28066:28066 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-44:28066:28066 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-44:28066:28066 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.78<0> [1]eth1:172.31.225.87<0> [2]eth2:172.31.235.21<0> [3]eth3:172.31.235.87<0> | |
gpu-st-p4d-24xlarge-44:28066:28066 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-53:26990:26990 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26990:26990 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26990:26990 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-53:26990:26990 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-52:39401:39401 [4] NCCL INFO Bootstrap : Using eth0:172.31.230.245<0> | |
gpu-st-p4d-24xlarge-52:39402:39402 [5] NCCL INFO Bootstrap : Using eth0:172.31.230.245<0> | |
gpu-st-p4d-24xlarge-52:39400:39400 [3] NCCL INFO Bootstrap : Using eth0:172.31.230.245<0> | |
gpu-st-p4d-24xlarge-44:28069:28069 [4] NCCL INFO Bootstrap : Using eth0:172.31.231.78<0> | |
gpu-st-p4d-24xlarge-53:26990:26990 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39399:39399 [2] NCCL INFO Bootstrap : Using eth0:172.31.230.245<0> | |
gpu-st-p4d-24xlarge-52:39398:39398 [1] NCCL INFO Bootstrap : Using eth0:172.31.230.245<0> | |
gpu-st-p4d-24xlarge-53:26990:26990 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26990:26990 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.198<0> [1]eth1:172.31.227.208<0> [2]eth2:172.31.224.139<0> [3]eth3:172.31.229.184<0> | |
gpu-st-p4d-24xlarge-53:26990:26990 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-52:39397:39397 [0] NCCL INFO Bootstrap : Using eth0:172.31.230.245<0> | |
gpu-st-p4d-24xlarge-44:28069:28069 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28069:28069 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28069:28069 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-44:28069:28069 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-44:28070:28070 [5] NCCL INFO Bootstrap : Using eth0:172.31.231.78<0> | |
gpu-st-p4d-24xlarge-44:28069:28069 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26986:26986 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26988:26988 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26989:26989 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26986:26986 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26991:26991 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26988:26988 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26987:26987 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26989:26989 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26991:26991 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26987:26987 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-53:26986:26986 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-53:26988:26988 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-53:26989:26989 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-53:26986:26986 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-53:26988:26988 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-53:26989:26989 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-53:26987:26987 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-53:26991:26991 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-53:26987:26987 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-53:26991:26991 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-44:28069:28069 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-44:28069:28069 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.78<0> [1]eth1:172.31.225.87<0> [2]eth2:172.31.235.21<0> [3]eth3:172.31.235.87<0> | |
gpu-st-p4d-24xlarge-44:28069:28069 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-44:28070:28070 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28070:28070 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28070:28070 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-44:28070:28070 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-53:26986:26986 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26987:26987 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26988:26988 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26989:26989 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26991:26991 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26987:26987 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26987:26987 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.198<0> [1]eth1:172.31.227.208<0> [2]eth2:172.31.224.139<0> [3]eth3:172.31.229.184<0> | |
gpu-st-p4d-24xlarge-53:26987:26987 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-53:26989:26989 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26988:26988 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26986:26986 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26991:26991 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-53:26989:26989 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.198<0> [1]eth1:172.31.227.208<0> [2]eth2:172.31.224.139<0> [3]eth3:172.31.229.184<0> | |
gpu-st-p4d-24xlarge-53:26989:26989 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-53:26986:26986 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.198<0> [1]eth1:172.31.227.208<0> [2]eth2:172.31.224.139<0> [3]eth3:172.31.229.184<0> | |
gpu-st-p4d-24xlarge-53:26991:26991 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.198<0> [1]eth1:172.31.227.208<0> [2]eth2:172.31.224.139<0> [3]eth3:172.31.229.184<0> | |
gpu-st-p4d-24xlarge-53:26988:26988 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.198<0> [1]eth1:172.31.227.208<0> [2]eth2:172.31.224.139<0> [3]eth3:172.31.229.184<0> | |
gpu-st-p4d-24xlarge-53:26988:26988 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-53:26991:26991 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-53:26986:26986 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-44:28070:28070 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30084:30084 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30081:30081 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30082:30082 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30086:30086 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30083:30083 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30085:30085 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28070:28070 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30084:30084 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30081:30081 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30082:30082 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30086:30086 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30083:30083 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30085:30085 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-46:30084:30084 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-46:30081:30081 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-46:30082:30082 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-46:30084:30084 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-46:30081:30081 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-46:30082:30082 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-46:30085:30085 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-46:30086:30086 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-46:30083:30083 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-46:30085:30085 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-46:30086:30086 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-46:30083:30083 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-44:28070:28070 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.78<0> [1]eth1:172.31.225.87<0> [2]eth2:172.31.235.21<0> [3]eth3:172.31.235.87<0> | |
gpu-st-p4d-24xlarge-44:28070:28070 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-45:30135:30135 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30136:30136 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30145:30145 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30135:30135 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30136:30136 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30145:30145 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30135:30135 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-45:30136:30136 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-45:30145:30145 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-45:30135:30135 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-45:30136:30136 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-45:30145:30145 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-46:30082:30082 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30081:30081 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30086:30086 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30083:30083 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30085:30085 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30084:30084 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30086:30086 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30083:30083 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30082:30082 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30081:30081 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30084:30084 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30085:30085 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-46:30086:30086 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.192<0> [1]eth1:172.31.231.65<0> [2]eth2:172.31.235.241<0> [3]eth3:172.31.237.240<0> | |
gpu-st-p4d-24xlarge-46:30086:30086 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-45:30139:30139 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30138:30138 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30139:30139 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30138:30138 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30137:30137 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30139:30139 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-45:30139:30139 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-45:30137:30137 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-45:30138:30138 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-45:30138:30138 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-45:30137:30137 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-45:30137:30137 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-46:30081:30081 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.192<0> [1]eth1:172.31.231.65<0> [2]eth2:172.31.235.241<0> [3]eth3:172.31.237.240<0> | |
gpu-st-p4d-24xlarge-46:30084:30084 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.192<0> [1]eth1:172.31.231.65<0> [2]eth2:172.31.235.241<0> [3]eth3:172.31.237.240<0> | |
gpu-st-p4d-24xlarge-46:30081:30081 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-46:30084:30084 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-46:30083:30083 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.192<0> [1]eth1:172.31.231.65<0> [2]eth2:172.31.235.241<0> [3]eth3:172.31.237.240<0> | |
gpu-st-p4d-24xlarge-46:30083:30083 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-46:30082:30082 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.192<0> [1]eth1:172.31.231.65<0> [2]eth2:172.31.235.241<0> [3]eth3:172.31.237.240<0> | |
gpu-st-p4d-24xlarge-46:30085:30085 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.192<0> [1]eth1:172.31.231.65<0> [2]eth2:172.31.235.241<0> [3]eth3:172.31.237.240<0> | |
gpu-st-p4d-24xlarge-46:30082:30082 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-46:30085:30085 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-45:30135:30135 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30138:30138 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30145:30145 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30137:30137 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30136:30136 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30139:30139 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30135:30135 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30145:30145 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30139:30139 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30136:30136 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30138:30138 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30137:30137 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-45:30135:30135 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.235.246<0> [1]eth1:172.31.233.8<0> [2]eth2:172.31.239.73<0> [3]eth3:172.31.232.185<0> | |
gpu-st-p4d-24xlarge-45:30135:30135 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-45:30145:30145 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.235.246<0> [1]eth1:172.31.233.8<0> [2]eth2:172.31.239.73<0> [3]eth3:172.31.232.185<0> | |
gpu-st-p4d-24xlarge-45:30145:30145 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-45:30136:30136 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.235.246<0> [1]eth1:172.31.233.8<0> [2]eth2:172.31.239.73<0> [3]eth3:172.31.232.185<0> | |
gpu-st-p4d-24xlarge-45:30136:30136 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-45:30138:30138 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.235.246<0> [1]eth1:172.31.233.8<0> [2]eth2:172.31.239.73<0> [3]eth3:172.31.232.185<0> | |
gpu-st-p4d-24xlarge-45:30139:30139 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.235.246<0> [1]eth1:172.31.233.8<0> [2]eth2:172.31.239.73<0> [3]eth3:172.31.232.185<0> | |
gpu-st-p4d-24xlarge-45:30138:30138 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-45:30137:30137 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.235.246<0> [1]eth1:172.31.233.8<0> [2]eth2:172.31.239.73<0> [3]eth3:172.31.232.185<0> | |
gpu-st-p4d-24xlarge-45:30139:30139 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-45:30137:30137 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-44:28067:28067 [2] NCCL INFO Bootstrap : Using eth0:172.31.231.78<0> | |
gpu-st-p4d-24xlarge-44:28067:28067 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28067:28067 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-44:28067:28067 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-44:28067:28067 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-44:28067:28067 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-44:28067:28067 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-44:28067:28067 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.78<0> [1]eth1:172.31.225.87<0> [2]eth2:172.31.235.21<0> [3]eth3:172.31.235.87<0> | |
gpu-st-p4d-24xlarge-44:28067:28067 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-56:28171:28171 [0] NCCL INFO Bootstrap : Using eth0:172.31.227.130<0> | |
gpu-st-p4d-24xlarge-56:28174:28174 [3] NCCL INFO Bootstrap : Using eth0:172.31.227.130<0> | |
gpu-st-p4d-24xlarge-56:28172:28172 [1] NCCL INFO Bootstrap : Using eth0:172.31.227.130<0> | |
gpu-st-p4d-24xlarge-56:28173:28173 [2] NCCL INFO Bootstrap : Using eth0:172.31.227.130<0> | |
gpu-st-p4d-24xlarge-56:28175:28175 [4] NCCL INFO Bootstrap : Using eth0:172.31.227.130<0> | |
gpu-st-p4d-24xlarge-56:28176:28176 [5] NCCL INFO Bootstrap : Using eth0:172.31.227.130<0> | |
gpu-st-p4d-24xlarge-58:27723:27723 [0] NCCL INFO Bootstrap : Using eth0:172.31.237.192<0> | |
gpu-st-p4d-24xlarge-59:27918:27918 [2] NCCL INFO Bootstrap : Using eth0:172.31.233.13<0> | |
gpu-st-p4d-24xlarge-58:27722:27722 [7] NCCL INFO Bootstrap : Using eth0:172.31.237.192<0> | |
gpu-st-p4d-24xlarge-59:27921:27921 [5] NCCL INFO Bootstrap : Using eth0:172.31.233.13<0> | |
gpu-st-p4d-24xlarge-52:39398:39398 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39398:39398 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39398:39398 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-52:39398:39398 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-59:27922:27922 [6] NCCL INFO Bootstrap : Using eth0:172.31.233.13<0> | |
gpu-st-p4d-24xlarge-58:27720:27720 [5] NCCL INFO Bootstrap : Using eth0:172.31.237.192<0> | |
gpu-st-p4d-24xlarge-52:39399:39399 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39399:39399 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39399:39399 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-52:39399:39399 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-52:39402:39402 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39402:39402 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39402:39402 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-52:39402:39402 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-58:27724:27724 [1] NCCL INFO Bootstrap : Using eth0:172.31.237.192<0> | |
gpu-st-p4d-24xlarge-58:27721:27721 [6] NCCL INFO Bootstrap : Using eth0:172.31.237.192<0> | |
gpu-st-p4d-24xlarge-59:27923:27923 [7] NCCL INFO Bootstrap : Using eth0:172.31.233.13<0> | |
gpu-st-p4d-24xlarge-59:27920:27920 [4] NCCL INFO Bootstrap : Using eth0:172.31.233.13<0> | |
gpu-st-p4d-24xlarge-58:27719:27719 [4] NCCL INFO Bootstrap : Using eth0:172.31.237.192<0> | |
gpu-st-p4d-24xlarge-59:27919:27919 [3] NCCL INFO Bootstrap : Using eth0:172.31.233.13<0> | |
gpu-st-p4d-24xlarge-52:39398:39398 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39399:39399 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39402:39402 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39398:39398 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39399:39399 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39402:39402 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39398:39398 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.230.245<0> [1]eth1:172.31.237.231<0> [2]eth2:172.31.225.111<0> [3]eth3:172.31.227.43<0> | |
gpu-st-p4d-24xlarge-52:39399:39399 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.230.245<0> [1]eth1:172.31.237.231<0> [2]eth2:172.31.225.111<0> [3]eth3:172.31.227.43<0> | |
gpu-st-p4d-24xlarge-52:39399:39399 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-52:39398:39398 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-52:39402:39402 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.230.245<0> [1]eth1:172.31.237.231<0> [2]eth2:172.31.225.111<0> [3]eth3:172.31.227.43<0> | |
gpu-st-p4d-24xlarge-52:39402:39402 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-52:39397:39397 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39401:39401 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39397:39397 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39401:39401 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39397:39397 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-52:39397:39397 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-52:39401:39401 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-52:39401:39401 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-52:39397:39397 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39401:39401 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39397:39397 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39397:39397 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.230.245<0> [1]eth1:172.31.237.231<0> [2]eth2:172.31.225.111<0> [3]eth3:172.31.227.43<0> | |
gpu-st-p4d-24xlarge-52:39397:39397 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-52:39401:39401 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39400:39400 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39400:39400 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-52:39400:39400 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-52:39400:39400 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-52:39401:39401 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.230.245<0> [1]eth1:172.31.237.231<0> [2]eth2:172.31.225.111<0> [3]eth3:172.31.227.43<0> | |
gpu-st-p4d-24xlarge-52:39401:39401 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-52:39400:39400 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39400:39400 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-52:39400:39400 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.230.245<0> [1]eth1:172.31.237.231<0> [2]eth2:172.31.225.111<0> [3]eth3:172.31.227.43<0> | |
gpu-st-p4d-24xlarge-52:39400:39400 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-51:30343:30343 [3] NCCL INFO Bootstrap : Using eth0:172.31.233.244<0> | |
gpu-st-p4d-24xlarge-51:30346:30346 [6] NCCL INFO Bootstrap : Using eth0:172.31.233.244<0> | |
gpu-st-p4d-24xlarge-51:30345:30345 [5] NCCL INFO Bootstrap : Using eth0:172.31.233.244<0> | |
gpu-st-p4d-24xlarge-51:30347:30347 [7] NCCL INFO Bootstrap : Using eth0:172.31.233.244<0> | |
gpu-st-p4d-24xlarge-51:30342:30342 [2] NCCL INFO Bootstrap : Using eth0:172.31.233.244<0> | |
gpu-st-p4d-24xlarge-56:28173:28173 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28173:28173 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28173:28173 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-56:28173:28173 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-51:30344:30344 [4] NCCL INFO Bootstrap : Using eth0:172.31.233.244<0> | |
gpu-st-p4d-24xlarge-56:28173:28173 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28173:28173 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28173:28173 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.130<0> [1]eth1:172.31.233.129<0> [2]eth2:172.31.237.125<0> [3]eth3:172.31.227.64<0> | |
gpu-st-p4d-24xlarge-56:28173:28173 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-56:28172:28172 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28172:28172 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28172:28172 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-56:28172:28172 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-56:28172:28172 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28172:28172 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28172:28172 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.130<0> [1]eth1:172.31.233.129<0> [2]eth2:172.31.237.125<0> [3]eth3:172.31.227.64<0> | |
gpu-st-p4d-24xlarge-56:28172:28172 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-56:28175:28175 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28176:28176 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28175:28175 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28174:28174 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28176:28176 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28171:28171 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28174:28174 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28175:28175 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-56:28175:28175 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-56:28171:28171 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-56:28176:28176 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-56:28176:28176 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-56:28174:28174 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-56:28174:28174 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-56:28171:28171 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-56:28171:28171 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-56:28174:28174 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28175:28175 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28171:28171 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28176:28176 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28174:28174 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28174:28174 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.130<0> [1]eth1:172.31.233.129<0> [2]eth2:172.31.237.125<0> [3]eth3:172.31.227.64<0> | |
gpu-st-p4d-24xlarge-56:28174:28174 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-56:28176:28176 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28171:28171 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28175:28175 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-56:28176:28176 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.130<0> [1]eth1:172.31.233.129<0> [2]eth2:172.31.237.125<0> [3]eth3:172.31.227.64<0> | |
gpu-st-p4d-24xlarge-56:28176:28176 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-56:28175:28175 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.130<0> [1]eth1:172.31.233.129<0> [2]eth2:172.31.237.125<0> [3]eth3:172.31.227.64<0> | |
gpu-st-p4d-24xlarge-56:28171:28171 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.227.130<0> [1]eth1:172.31.233.129<0> [2]eth2:172.31.237.125<0> [3]eth3:172.31.227.64<0> | |
gpu-st-p4d-24xlarge-56:28175:28175 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-56:28171:28171 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-55:29157:29157 [5] NCCL INFO Bootstrap : Using eth0:172.31.231.150<0> | |
gpu-st-p4d-24xlarge-55:29155:29155 [3] NCCL INFO Bootstrap : Using eth0:172.31.231.150<0> | |
gpu-st-p4d-24xlarge-55:29159:29159 [7] NCCL INFO Bootstrap : Using eth0:172.31.231.150<0> | |
gpu-st-p4d-24xlarge-55:29156:29156 [4] NCCL INFO Bootstrap : Using eth0:172.31.231.150<0> | |
gpu-st-p4d-24xlarge-55:29158:29158 [6] NCCL INFO Bootstrap : Using eth0:172.31.231.150<0> | |
gpu-st-p4d-24xlarge-55:29154:29154 [2] NCCL INFO Bootstrap : Using eth0:172.31.231.150<0> | |
gpu-st-p4d-24xlarge-59:27923:27923 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27919:27919 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27921:27921 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27923:27923 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27920:27920 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27922:27922 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27919:27919 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27921:27921 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27920:27920 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27922:27922 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27918:27918 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27923:27923 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-59:27923:27923 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-59:27921:27921 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-59:27919:27919 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-59:27918:27918 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-59:27922:27922 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-59:27921:27921 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-59:27919:27919 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-59:27920:27920 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-59:27922:27922 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-59:27920:27920 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-59:27918:27918 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-59:27918:27918 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-58:27719:27719 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27720:27720 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27721:27721 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27723:27723 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27724:27724 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27722:27722 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27719:27719 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27722:27722 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27724:27724 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27721:27721 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27720:27720 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27723:27723 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-58:27719:27719 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-58:27724:27724 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-58:27722:27722 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-58:27719:27719 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-58:27720:27720 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-58:27723:27723 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-58:27724:27724 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-58:27722:27722 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-58:27721:27721 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-58:27720:27720 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-58:27723:27723 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-58:27721:27721 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-59:27919:27919 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27923:27923 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27918:27918 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27922:27922 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27920:27920 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27921:27921 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27720:27720 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27722:27722 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27723:27723 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27724:27724 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27721:27721 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27719:27719 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27922:27922 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27918:27918 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27920:27920 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27921:27921 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27923:27923 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27919:27919 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-59:27922:27922 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.13<0> [1]eth1:172.31.239.86<0> [2]eth2:172.31.227.84<0> [3]eth3:172.31.237.21<0> | |
gpu-st-p4d-24xlarge-59:27922:27922 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-59:27918:27918 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.13<0> [1]eth1:172.31.239.86<0> [2]eth2:172.31.227.84<0> [3]eth3:172.31.237.21<0> | |
gpu-st-p4d-24xlarge-59:27918:27918 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-59:27923:27923 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.13<0> [1]eth1:172.31.239.86<0> [2]eth2:172.31.227.84<0> [3]eth3:172.31.237.21<0> | |
gpu-st-p4d-24xlarge-59:27920:27920 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.13<0> [1]eth1:172.31.239.86<0> [2]eth2:172.31.227.84<0> [3]eth3:172.31.237.21<0> | |
gpu-st-p4d-24xlarge-59:27921:27921 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.13<0> [1]eth1:172.31.239.86<0> [2]eth2:172.31.227.84<0> [3]eth3:172.31.237.21<0> | |
gpu-st-p4d-24xlarge-59:27923:27923 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-59:27919:27919 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.13<0> [1]eth1:172.31.239.86<0> [2]eth2:172.31.227.84<0> [3]eth3:172.31.237.21<0> | |
gpu-st-p4d-24xlarge-59:27920:27920 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-59:27921:27921 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-59:27919:27919 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-58:27720:27720 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27722:27722 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27723:27723 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27719:27719 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27724:27724 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27721:27721 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-58:27720:27720 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.192<0> [1]eth1:172.31.225.254<0> [2]eth2:172.31.234.186<0> [3]eth3:172.31.234.61<0> | |
gpu-st-p4d-24xlarge-58:27720:27720 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-58:27723:27723 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.192<0> [1]eth1:172.31.225.254<0> [2]eth2:172.31.234.186<0> [3]eth3:172.31.234.61<0> | |
gpu-st-p4d-24xlarge-58:27722:27722 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.192<0> [1]eth1:172.31.225.254<0> [2]eth2:172.31.234.186<0> [3]eth3:172.31.234.61<0> | |
gpu-st-p4d-24xlarge-58:27721:27721 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.192<0> [1]eth1:172.31.225.254<0> [2]eth2:172.31.234.186<0> [3]eth3:172.31.234.61<0> | |
gpu-st-p4d-24xlarge-58:27722:27722 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-58:27723:27723 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-58:27721:27721 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-58:27724:27724 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.192<0> [1]eth1:172.31.225.254<0> [2]eth2:172.31.234.186<0> [3]eth3:172.31.234.61<0> | |
gpu-st-p4d-24xlarge-58:27719:27719 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.237.192<0> [1]eth1:172.31.225.254<0> [2]eth2:172.31.234.186<0> [3]eth3:172.31.234.61<0> | |
gpu-st-p4d-24xlarge-58:27724:27724 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-58:27719:27719 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-48:30191:30191 [0] NCCL INFO Bootstrap : Using eth0:172.31.234.184<0> | |
gpu-st-p4d-24xlarge-50:29215:29215 [0] NCCL INFO Bootstrap : Using eth0:172.31.231.152<0> | |
gpu-st-p4d-24xlarge-51:30344:30344 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30343:30343 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30342:30342 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30347:30347 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30344:30344 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30343:30343 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30342:30342 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30347:30347 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30343:30343 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-51:30344:30344 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-51:30342:30342 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-51:30343:30343 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-51:30344:30344 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-51:30342:30342 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-51:30347:30347 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-51:30347:30347 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-48:30193:30193 [2] NCCL INFO Bootstrap : Using eth0:172.31.234.184<0> | |
gpu-st-p4d-24xlarge-48:30192:30192 [1] NCCL INFO Bootstrap : Using eth0:172.31.234.184<0> | |
gpu-st-p4d-24xlarge-48:30196:30196 [5] NCCL INFO Bootstrap : Using eth0:172.31.234.184<0> | |
gpu-st-p4d-24xlarge-48:30195:30195 [4] NCCL INFO Bootstrap : Using eth0:172.31.234.184<0> | |
gpu-st-p4d-24xlarge-51:30345:30345 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30346:30346 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30346:30346 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30345:30345 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-51:30345:30345 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-51:30346:30346 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-51:30345:30345 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-51:30346:30346 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-48:30194:30194 [3] NCCL INFO Bootstrap : Using eth0:172.31.234.184<0> | |
gpu-st-p4d-24xlarge-50:29211:29211 [4] NCCL INFO Bootstrap : Using eth0:172.31.231.152<0> | |
gpu-st-p4d-24xlarge-50:29212:29212 [5] NCCL INFO Bootstrap : Using eth0:172.31.231.152<0> | |
gpu-st-p4d-24xlarge-50:29214:29214 [7] NCCL INFO Bootstrap : Using eth0:172.31.231.152<0> | |
gpu-st-p4d-24xlarge-50:29216:29216 [1] NCCL INFO Bootstrap : Using eth0:172.31.231.152<0> | |
gpu-st-p4d-24xlarge-51:30342:30342 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30347:30347 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30346:30346 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30343:30343 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30345:30345 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30344:30344 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30343:30343 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30342:30342 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30344:30344 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30346:30346 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30347:30347 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30345:30345 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-51:30343:30343 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.244<0> [1]eth1:172.31.226.133<0> [2]eth2:172.31.227.24<0> [3]eth3:172.31.234.3<0> | |
gpu-st-p4d-24xlarge-51:30343:30343 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-51:30342:30342 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.244<0> [1]eth1:172.31.226.133<0> [2]eth2:172.31.227.24<0> [3]eth3:172.31.234.3<0> | |
gpu-st-p4d-24xlarge-51:30342:30342 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-50:29213:29213 [6] NCCL INFO Bootstrap : Using eth0:172.31.231.152<0> | |
gpu-st-p4d-24xlarge-51:30344:30344 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.244<0> [1]eth1:172.31.226.133<0> [2]eth2:172.31.227.24<0> [3]eth3:172.31.234.3<0> | |
gpu-st-p4d-24xlarge-51:30344:30344 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-51:30346:30346 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.244<0> [1]eth1:172.31.226.133<0> [2]eth2:172.31.227.24<0> [3]eth3:172.31.234.3<0> | |
gpu-st-p4d-24xlarge-51:30346:30346 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-51:30347:30347 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.244<0> [1]eth1:172.31.226.133<0> [2]eth2:172.31.227.24<0> [3]eth3:172.31.234.3<0> | |
gpu-st-p4d-24xlarge-51:30347:30347 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-51:30345:30345 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.233.244<0> [1]eth1:172.31.226.133<0> [2]eth2:172.31.227.24<0> [3]eth3:172.31.234.3<0> | |
gpu-st-p4d-24xlarge-51:30345:30345 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-55:29156:29156 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29158:29158 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29157:29157 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29159:29159 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29156:29156 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29158:29158 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29157:29157 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29159:29159 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29155:29155 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29156:29156 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-55:29158:29158 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-55:29156:29156 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-55:29155:29155 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29158:29158 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-55:29157:29157 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-55:29159:29159 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-55:29157:29157 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-55:29159:29159 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-55:29155:29155 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-55:29155:29155 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-55:29154:29154 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29154:29154 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-55:29154:29154 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-55:29154:29154 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-55:29157:29157 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29158:29158 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29155:29155 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29154:29154 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29156:29156 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29159:29159 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29155:29155 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29157:29157 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29154:29154 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29156:29156 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29155:29155 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.150<0> [1]eth1:172.31.226.208<0> [2]eth2:172.31.239.214<0> [3]eth3:172.31.234.152<0> | |
gpu-st-p4d-24xlarge-55:29159:29159 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29158:29158 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-55:29155:29155 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-55:29157:29157 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.150<0> [1]eth1:172.31.226.208<0> [2]eth2:172.31.239.214<0> [3]eth3:172.31.234.152<0> | |
gpu-st-p4d-24xlarge-55:29157:29157 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-55:29156:29156 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.150<0> [1]eth1:172.31.226.208<0> [2]eth2:172.31.239.214<0> [3]eth3:172.31.234.152<0> | |
gpu-st-p4d-24xlarge-55:29156:29156 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-55:29158:29158 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.150<0> [1]eth1:172.31.226.208<0> [2]eth2:172.31.239.214<0> [3]eth3:172.31.234.152<0> | |
gpu-st-p4d-24xlarge-55:29159:29159 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.150<0> [1]eth1:172.31.226.208<0> [2]eth2:172.31.239.214<0> [3]eth3:172.31.234.152<0> | |
gpu-st-p4d-24xlarge-55:29158:29158 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-55:29159:29159 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-55:29154:29154 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.150<0> [1]eth1:172.31.226.208<0> [2]eth2:172.31.239.214<0> [3]eth3:172.31.234.152<0> | |
gpu-st-p4d-24xlarge-55:29154:29154 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-48:30193:30193 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30193:30193 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30193:30193 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-48:30193:30193 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-48:30191:30191 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30191:30191 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30191:30191 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-48:30191:30191 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-48:30194:30194 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30196:30196 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30194:30194 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30196:30196 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30194:30194 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-48:30194:30194 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-48:30196:30196 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-48:30196:30196 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-48:30192:30192 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30193:30193 [7] NCCL INFO Bootstrap : Using eth0:172.31.226.226<0> | |
gpu-st-p4d-24xlarge-48:30192:30192 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30192:30192 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-48:30192:30192 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-47:30189:30189 [3] NCCL INFO Bootstrap : Using eth0:172.31.226.226<0> | |
gpu-st-p4d-24xlarge-47:30188:30188 [2] NCCL INFO Bootstrap : Using eth0:172.31.226.226<0> | |
gpu-st-p4d-24xlarge-47:30192:30192 [6] NCCL INFO Bootstrap : Using eth0:172.31.226.226<0> | |
gpu-st-p4d-24xlarge-47:30190:30190 [4] NCCL INFO Bootstrap : Using eth0:172.31.226.226<0> | |
gpu-st-p4d-24xlarge-47:30191:30191 [5] NCCL INFO Bootstrap : Using eth0:172.31.226.226<0> | |
gpu-st-p4d-24xlarge-48:30192:30192 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30191:30191 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30196:30196 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30194:30194 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30193:30193 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30195:30195 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30195:30195 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-48:30195:30195 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-48:30195:30195 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-48:30191:30191 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30196:30196 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30194:30194 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30192:30192 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30193:30193 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30191:30191 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.234.184<0> [1]eth1:172.31.234.169<0> [2]eth2:172.31.237.157<0> [3]eth3:172.31.230.169<0> | |
gpu-st-p4d-24xlarge-48:30191:30191 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-48:30194:30194 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.234.184<0> [1]eth1:172.31.234.169<0> [2]eth2:172.31.237.157<0> [3]eth3:172.31.230.169<0> | |
gpu-st-p4d-24xlarge-48:30194:30194 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-48:30192:30192 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.234.184<0> [1]eth1:172.31.234.169<0> [2]eth2:172.31.237.157<0> [3]eth3:172.31.230.169<0> | |
gpu-st-p4d-24xlarge-48:30192:30192 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-48:30193:30193 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.234.184<0> [1]eth1:172.31.234.169<0> [2]eth2:172.31.237.157<0> [3]eth3:172.31.230.169<0> | |
gpu-st-p4d-24xlarge-48:30196:30196 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.234.184<0> [1]eth1:172.31.234.169<0> [2]eth2:172.31.237.157<0> [3]eth3:172.31.230.169<0> | |
gpu-st-p4d-24xlarge-48:30193:30193 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-48:30196:30196 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-48:30195:30195 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30195:30195 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-48:30195:30195 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.234.184<0> [1]eth1:172.31.234.169<0> [2]eth2:172.31.237.157<0> [3]eth3:172.31.230.169<0> | |
gpu-st-p4d-24xlarge-48:30195:30195 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-50:29213:29213 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29216:29216 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29212:29212 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29211:29211 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29216:29216 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29213:29213 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29212:29212 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29211:29211 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29216:29216 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-50:29213:29213 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-50:29212:29212 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-50:29216:29216 [1] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-50:29211:29211 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-50:29213:29213 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-50:29212:29212 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-50:29211:29211 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-50:29215:29215 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29214:29214 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29215:29215 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29214:29214 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-50:29215:29215 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-50:29214:29214 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-50:29215:29215 [0] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-50:29214:29214 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-50:29211:29211 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29216:29216 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29214:29214 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29213:29213 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29212:29212 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29215:29215 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29211:29211 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29216:29216 [1] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29213:29213 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29212:29212 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29215:29215 [0] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29214:29214 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-50:29211:29211 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.152<0> [1]eth1:172.31.239.148<0> [2]eth2:172.31.238.214<0> [3]eth3:172.31.234.88<0> | |
gpu-st-p4d-24xlarge-50:29211:29211 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-50:29213:29213 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.152<0> [1]eth1:172.31.239.148<0> [2]eth2:172.31.238.214<0> [3]eth3:172.31.234.88<0> | |
gpu-st-p4d-24xlarge-50:29213:29213 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-50:29215:29215 [0] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.152<0> [1]eth1:172.31.239.148<0> [2]eth2:172.31.238.214<0> [3]eth3:172.31.234.88<0> | |
gpu-st-p4d-24xlarge-50:29212:29212 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.152<0> [1]eth1:172.31.239.148<0> [2]eth2:172.31.238.214<0> [3]eth3:172.31.234.88<0> | |
gpu-st-p4d-24xlarge-50:29215:29215 [0] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-50:29212:29212 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-50:29214:29214 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.152<0> [1]eth1:172.31.239.148<0> [2]eth2:172.31.238.214<0> [3]eth3:172.31.234.88<0> | |
gpu-st-p4d-24xlarge-50:29214:29214 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-50:29216:29216 [1] NCCL INFO NET/Socket : Using [0]eth0:172.31.231.152<0> [1]eth1:172.31.239.148<0> [2]eth2:172.31.238.214<0> [3]eth3:172.31.234.88<0> | |
gpu-st-p4d-24xlarge-50:29216:29216 [1] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-47:30192:30192 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30192:30192 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30192:30192 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-47:30192:30192 [6] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-47:30192:30192 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30192:30192 [6] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30192:30192 [6] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.226<0> [1]eth1:172.31.235.228<0> [2]eth2:172.31.224.181<0> [3]eth3:172.31.227.177<0> | |
gpu-st-p4d-24xlarge-47:30192:30192 [6] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-47:30191:30191 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30191:30191 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30191:30191 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-47:30191:30191 [5] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-47:30191:30191 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30191:30191 [5] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30191:30191 [5] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.226<0> [1]eth1:172.31.235.228<0> [2]eth2:172.31.224.181<0> [3]eth3:172.31.227.177<0> | |
gpu-st-p4d-24xlarge-47:30191:30191 [5] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-47:30190:30190 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30189:30189 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30188:30188 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30193:30193 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30189:30189 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30190:30190 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30188:30188 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30193:30193 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. | |
gpu-st-p4d-24xlarge-47:30190:30190 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-47:30189:30189 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-47:30188:30188 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-47:30190:30190 [4] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-47:30193:30193 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so | |
gpu-st-p4d-24xlarge-47:30189:30189 [3] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-47:30188:30188 [2] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-47:30193:30193 [7] NCCL INFO P2P plugin IBext | |
gpu-st-p4d-24xlarge-47:30188:30188 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30190:30190 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30189:30189 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30193:30193 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30189:30189 [3] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30193:30193 [7] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30188:30188 [2] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30190:30190 [4] NCCL INFO NET/IB : No device found. | |
gpu-st-p4d-24xlarge-47:30189:30189 [3] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.226<0> [1]eth1:172.31.235.228<0> [2]eth2:172.31.224.181<0> [3]eth3:172.31.227.177<0> | |
gpu-st-p4d-24xlarge-47:30189:30189 [3] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-47:30188:30188 [2] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.226<0> [1]eth1:172.31.235.228<0> [2]eth2:172.31.224.181<0> [3]eth3:172.31.227.177<0> | |
gpu-st-p4d-24xlarge-47:30188:30188 [2] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-47:30193:30193 [7] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.226<0> [1]eth1:172.31.235.228<0> [2]eth2:172.31.224.181<0> [3]eth3:172.31.227.177<0> | |
gpu-st-p4d-24xlarge-47:30190:30190 [4] NCCL INFO NET/Socket : Using [0]eth0:172.31.226.226<0> [1]eth1:172.31.235.228<0> [2]eth2:172.31.224.181<0> [3]eth3:172.31.227.177<0> | |
gpu-st-p4d-24xlarge-47:30193:30193 [7] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-47:30190:30190 [4] NCCL INFO Using network Socket | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Setting affinity for GPU 3 to ffffff | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Setting affinity for GPU 1 to ffffff | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Setting affinity for GPU 2 to ffffff | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ff000000 | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Setting affinity for GPU 0 to ffffff | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Trees [0] 81/-1/-1->80->79 [1] 81/-1/-1->80->79 [2] 81/-1/-1->80->89 [3] 81/-1/-1->80->89 [4] 81/-1/-1->80->79 [5] 81/-1/-1->80->79 [6] 81/76/-1->80->69 [7] 81/76/-1->80->69 | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Trees [0] -1/-1/-1->83->82 [1] -1/-1/-1->83->82 [2] 78/-1/-1->83->82 [3] 78/-1/-1->83->82 [4] -1/-1/-1->83->82 [5] -1/-1/-1->83->82 [6] 78/-1/-1->83->82 [7] 78/-1/-1->83->82 | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Trees [0] 82/-1/-1->81->80 [1] 82/-1/-1->81->80 [2] 82/-1/-1->81->80 [3] 82/-1/-1->81->80 [4] 82/-1/-1->81->80 [5] 82/-1/-1->81->80 [6] 82/88/-1->81->80 [7] 82/88/-1->81->80 | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Trees [0] 83/-1/-1->82->81 [1] 83/-1/-1->82->81 [2] 83/-1/-1->82->81 [3] 83/-1/-1->82->81 [4] 83/-1/-1->82->81 [5] 83/-1/-1->82->81 [6] 83/-1/-1->82->81 [7] 83/-1/-1->82->81 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Trees [0] 85/90/-1->84->72 [1] 85/90/-1->84->72 [2] 85/-1/-1->84->89 [3] 85/-1/-1->84->89 [4] 85/-1/-1->84->79 [5] 85/-1/-1->84->79 [6] 85/-1/-1->84->89 [7] 85/-1/-1->84->89 | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Trees [0] 87/-1/-1->86->85 [1] 87/-1/-1->86->85 [2] 87/-1/-1->86->85 [3] 87/-1/-1->86->85 [4] 87/-1/-1->86->85 [5] 87/-1/-1->86->85 [6] 87/-1/-1->86->85 [7] 87/-1/-1->86->85 | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Trees [0] 86/78/-1->85->84 [1] 86/78/-1->85->84 [2] 86/-1/-1->85->84 [3] 86/-1/-1->85->84 [4] 86/-1/-1->85->84 [5] 86/-1/-1->85->84 [6] 86/-1/-1->85->84 [7] 86/-1/-1->85->84 | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Trees [0] 88/-1/-1->87->86 [1] 88/-1/-1->87->86 [2] -1/-1/-1->87->86 [3] -1/-1/-1->87->86 [4] 88/-1/-1->87->86 [5] 88/-1/-1->87->86 [6] -1/-1/-1->87->86 [7] -1/-1/-1->87->86 | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Trees [0] 89/-1/-1->88->87 [1] 89/-1/-1->88->87 [2] 89/92/-1->88->76 [3] 89/92/-1->88->76 [4] 89/-1/-1->88->87 [5] 89/-1/-1->88->87 [6] 89/-1/-1->88->81 [7] 89/-1/-1->88->81 | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Trees [0] -1/-1/-1->89->88 [1] -1/-1/-1->89->88 [2] 84/80/-1->89->88 [3] 84/80/-1->89->88 [4] -1/-1/-1->89->88 [5] -1/-1/-1->89->88 [6] 84/-1/-1->89->88 [7] 84/-1/-1->89->88 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Trees [0] 91/-1/-1->90->84 [1] 91/-1/-1->90->84 [2] 91/-1/-1->90->95 [3] 91/-1/-1->90->95 [4] 91/42/-1->90->-1 [5] 91/42/-1->90->-1 [6] 91/-1/-1->90->95 [7] 91/-1/-1->90->95 | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Trees [0] -1/-1/-1->77->76 [1] -1/-1/-1->77->76 [2] 72/64/-1->77->76 [3] 72/64/-1->77->76 [4] -1/-1/-1->77->76 [5] -1/-1/-1->77->76 [6] 72/-1/-1->77->76 [7] 72/-1/-1->77->76 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 00/08 : 0 5 4 3 2 1 6 11 10 9 8 7 12 17 16 15 14 13 18 23 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 01/08 : 0 5 4 3 2 1 6 11 10 9 8 7 12 17 16 15 14 13 18 23 | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Trees [0] 80/-1/-1->79->78 [1] 80/-1/-1->79->78 [2] -1/-1/-1->79->78 [3] -1/-1/-1->79->78 [4] 80/84/-1->79->78 [5] 80/84/-1->79->78 [6] -1/-1/-1->79->78 [7] -1/-1/-1->79->78 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 02/08 : 0 3 2 5 8 7 6 11 10 9 16 13 12 15 14 17 20 19 18 23 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 03/08 : 0 3 2 5 8 7 6 11 10 9 16 13 12 15 14 17 20 19 18 23 | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Trees [0] -1/-1/-1->95->94 [1] -1/-1/-1->95->94 [2] 90/-1/-1->95->94 [3] 90/-1/-1->95->94 [4] -1/-1/-1->95->94 [5] -1/-1/-1->95->94 [6] 90/-1/-1->95->94 [7] 90/-1/-1->95->94 | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Trees [0] 93/-1/-1->92->91 [1] 93/-1/-1->92->91 [2] 93/-1/-1->92->88 [3] 93/-1/-1->92->88 [4] 93/-1/-1->92->91 [5] 93/-1/-1->92->91 [6] 93/44/-1->92->-1 [7] 93/44/-1->92->-1 | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Trees [0] 77/-1/-1->76->75 [1] 77/-1/-1->76->75 [2] 77/88/-1->76->52 [3] 77/88/-1->76->52 [4] 77/-1/-1->76->75 [5] 77/-1/-1->76->75 [6] 77/-1/-1->76->80 [7] 77/-1/-1->76->80 | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 04/08 : 0 5 4 3 2 1 6 11 10 9 8 7 12 17 16 15 14 13 18 23 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 05/08 : 0 5 4 3 2 1 6 11 10 9 8 7 12 17 16 15 14 13 18 23 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Trees [0] 79/-1/-1->78->85 [1] 79/-1/-1->78->85 [2] 79/-1/-1->78->83 [3] 79/-1/-1->78->83 [4] 79/72/-1->78->67 [5] 79/72/-1->78->67 [6] 79/-1/-1->78->83 [7] 79/-1/-1->78->83 | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Trees [0] 76/-1/-1->75->74 [1] 76/-1/-1->75->74 [2] -1/-1/-1->75->74 [3] -1/-1/-1->75->74 [4] 76/-1/-1->75->74 [5] 76/-1/-1->75->74 [6] -1/-1/-1->75->74 [7] -1/-1/-1->75->74 | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] -1/-1/-1->3->2 [3] -1/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] -1/-1/-1->3->2 [7] -1/-1/-1->3->2 | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Trees [0] -1/-1/-1->5->4 [1] -1/-1/-1->5->4 [2] 0/-1/-1->5->4 [3] 0/-1/-1->5->4 [4] -1/-1/-1->5->4 [5] -1/-1/-1->5->4 [6] 0/-1/-1->5->4 [7] 0/-1/-1->5->4 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 06/08 : 0 3 2 5 8 7 6 11 10 9 16 13 12 15 14 17 20 19 18 23 | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/52/-1->4->-1 [3] 5/52/-1->4->-1 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->8 [7] 5/-1/-1->4->8 | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Trees [0] 8/-1/-1->7->6 [1] 8/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] 8/12/-1->7->6 [5] 8/12/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Trees [0] 9/-1/-1->8->7 [1] 9/-1/-1->8->7 [2] 9/-1/-1->8->17 [3] 9/-1/-1->8->17 [4] 9/-1/-1->8->7 [5] 9/-1/-1->8->7 [6] 9/4/-1->8->20 [7] 9/4/-1->8->20 | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Trees [0] -1/-1/-1->71->70 [1] -1/-1/-1->71->70 [2] 66/-1/-1->71->70 [3] 66/-1/-1->71->70 [4] -1/-1/-1->71->70 [5] -1/-1/-1->71->70 [6] 66/-1/-1->71->70 [7] 66/-1/-1->71->70 | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Trees [0] 71/-1/-1->70->69 [1] 71/-1/-1->70->69 [2] 71/-1/-1->70->69 [3] 71/-1/-1->70->69 [4] 71/-1/-1->70->69 [5] 71/-1/-1->70->69 [6] 71/-1/-1->70->69 [7] 71/-1/-1->70->69 | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Trees [0] 75/-1/-1->74->73 [1] 75/-1/-1->74->73 [2] 75/-1/-1->74->73 [3] 75/-1/-1->74->73 [4] 75/-1/-1->74->73 [5] 75/-1/-1->74->73 [6] 75/-1/-1->74->73 [7] 75/-1/-1->74->73 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 07/08 : 0 3 2 5 8 7 6 11 10 9 16 13 12 15 14 17 20 19 18 23 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Trees [0] 1/48/-1->0->-1 [1] 1/48/-1->0->-1 [2] 1/-1/-1->0->5 [3] 1/-1/-1->0->5 [4] 1/-1/-1->0->6 [5] 1/-1/-1->0->6 [6] 1/-1/-1->0->5 [7] 1/-1/-1->0->5 | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Trees [0] 65/-1/-1->64->63 [1] 65/-1/-1->64->63 [2] 65/68/-1->64->77 [3] 65/68/-1->64->77 [4] 65/-1/-1->64->63 [5] 65/-1/-1->64->63 [6] 65/-1/-1->64->57 [7] 65/-1/-1->64->57 | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Trees [0] -1/-1/-1->65->64 [1] -1/-1/-1->65->64 [2] 60/56/-1->65->64 [3] 60/56/-1->65->64 [4] -1/-1/-1->65->64 [5] -1/-1/-1->65->64 [6] 60/-1/-1->65->64 [7] 60/-1/-1->65->64 | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Trees [0] 17/-1/-1->16->15 [1] 17/-1/-1->16->15 [2] 17/20/-1->16->29 [3] 17/20/-1->16->29 [4] 17/-1/-1->16->15 [5] 17/-1/-1->16->15 [6] 17/-1/-1->16->9 [7] 17/-1/-1->16->9 | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Trees [0] 95/-1/-1->94->93 [1] 95/-1/-1->94->93 [2] 95/-1/-1->94->93 [3] 95/-1/-1->94->93 [4] 95/-1/-1->94->93 [5] 95/-1/-1->94->93 [6] 95/-1/-1->94->93 [7] 95/-1/-1->94->93 | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Trees [0] 94/-1/-1->93->92 [1] 94/-1/-1->93->92 [2] 94/-1/-1->93->92 [3] 94/-1/-1->93->92 [4] 94/-1/-1->93->92 [5] 94/-1/-1->93->92 [6] 94/-1/-1->93->92 [7] 94/-1/-1->93->92 | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Trees [0] 92/-1/-1->91->90 [1] 92/-1/-1->91->90 [2] -1/-1/-1->91->90 [3] -1/-1/-1->91->90 [4] 92/-1/-1->91->90 [5] 92/-1/-1->91->90 [6] -1/-1/-1->91->90 [7] -1/-1/-1->91->90 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Trees [0] 73/84/-1->72->48 [1] 73/84/-1->72->48 [2] 73/-1/-1->72->77 [3] 73/-1/-1->72->77 [4] 73/-1/-1->72->78 [5] 73/-1/-1->72->78 [6] 73/-1/-1->72->77 [7] 73/-1/-1->72->77 | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Trees [0] 74/60/-1->73->72 [1] 74/60/-1->73->72 [2] 74/-1/-1->73->72 [3] 74/-1/-1->73->72 [4] 74/-1/-1->73->72 [5] 74/-1/-1->73->72 [6] 74/-1/-1->73->72 [7] 74/-1/-1->73->72 | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Trees [0] -1/-1/-1->11->10 [1] -1/-1/-1->11->10 [2] 6/-1/-1->11->10 [3] 6/-1/-1->11->10 [4] -1/-1/-1->11->10 [5] -1/-1/-1->11->10 [6] 6/-1/-1->11->10 [7] 6/-1/-1->11->10 | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Trees [0] 11/-1/-1->10->9 [1] 11/-1/-1->10->9 [2] 11/-1/-1->10->9 [3] 11/-1/-1->10->9 [4] 11/-1/-1->10->9 [5] 11/-1/-1->10->9 [6] 11/-1/-1->10->9 [7] 11/-1/-1->10->9 | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] 10/-1/-1->9->8 [2] 10/-1/-1->9->8 [3] 10/-1/-1->9->8 [4] 10/-1/-1->9->8 [5] 10/-1/-1->9->8 [6] 10/16/-1->9->8 [7] 10/16/-1->9->8 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Trees [0] 7/-1/-1->6->13 [1] 7/-1/-1->6->13 [2] 7/-1/-1->6->11 [3] 7/-1/-1->6->11 [4] 7/0/-1->6->18 [5] 7/0/-1->6->18 [6] 7/-1/-1->6->11 [7] 7/-1/-1->6->11 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Trees [0] 57/-1/-1->56->55 [1] 57/-1/-1->56->55 [2] 57/-1/-1->56->65 [3] 57/-1/-1->56->65 [4] 57/-1/-1->56->55 [5] 57/-1/-1->56->55 [6] 57/52/-1->56->68 [7] 57/52/-1->56->68 | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Trees [0] 70/-1/-1->69->68 [1] 70/-1/-1->69->68 [2] 70/-1/-1->69->68 [3] 70/-1/-1->69->68 [4] 70/-1/-1->69->68 [5] 70/-1/-1->69->68 [6] 70/80/-1->69->68 [7] 70/80/-1->69->68 | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Trees [0] -1/-1/-1->53->52 [1] -1/-1/-1->53->52 [2] 48/28/-1->53->52 [3] 48/28/-1->53->52 [4] -1/-1/-1->53->52 [5] -1/-1/-1->53->52 [6] 48/-1/-1->53->52 [7] 48/-1/-1->53->52 | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Trees [0] 53/-1/-1->52->51 [1] 53/-1/-1->52->51 [2] 53/76/-1->52->4 [3] 53/76/-1->52->4 [4] 53/-1/-1->52->51 [5] 53/-1/-1->52->51 [6] 53/-1/-1->52->56 [7] 53/-1/-1->52->56 | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Trees [0] 64/-1/-1->63->62 [1] 64/-1/-1->63->62 [2] -1/-1/-1->63->62 [3] -1/-1/-1->63->62 [4] 64/-1/-1->63->62 [5] 64/-1/-1->63->62 [6] -1/-1/-1->63->62 [7] -1/-1/-1->63->62 | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Trees [0] -1/-1/-1->47->46 [1] -1/-1/-1->47->46 [2] 42/-1/-1->47->46 [3] 42/-1/-1->47->46 [4] -1/-1/-1->47->46 [5] -1/-1/-1->47->46 [6] 42/-1/-1->47->46 [7] 42/-1/-1->47->46 | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Trees [0] 47/-1/-1->46->45 [1] 47/-1/-1->46->45 [2] 47/-1/-1->46->45 [3] 47/-1/-1->46->45 [4] 47/-1/-1->46->45 [5] 47/-1/-1->46->45 [6] 47/-1/-1->46->45 [7] 47/-1/-1->46->45 | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Trees [0] 69/-1/-1->68->67 [1] 69/-1/-1->68->67 [2] 69/-1/-1->68->64 [3] 69/-1/-1->68->64 [4] 69/-1/-1->68->67 [5] 69/-1/-1->68->67 [6] 69/56/-1->68->45 [7] 69/56/-1->68->45 | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13 [2] 15/-1/-1->14->13 [3] 15/-1/-1->14->13 [4] 15/-1/-1->14->13 [5] 15/-1/-1->14->13 [6] 15/-1/-1->14->13 [7] 15/-1/-1->14->13 | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Trees [0] 41/-1/-1->40->39 [1] 41/-1/-1->40->39 [2] 41/44/-1->40->28 [3] 41/44/-1->40->28 [4] 41/-1/-1->40->39 [5] 41/-1/-1->40->39 [6] 41/-1/-1->40->33 [7] 41/-1/-1->40->33 | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Trees [0] 19/-1/-1->18->12 [1] 19/-1/-1->18->12 [2] 19/-1/-1->18->23 [3] 19/-1/-1->18->23 [4] 19/6/-1->18->42 [5] 19/6/-1->18->42 [6] 19/-1/-1->18->23 [7] 19/-1/-1->18->23 | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Trees [0] 20/-1/-1->19->18 [1] 20/-1/-1->19->18 [2] -1/-1/-1->19->18 [3] -1/-1/-1->19->18 [4] 20/30/-1->19->18 [5] 20/30/-1->19->18 [6] -1/-1/-1->19->18 [7] -1/-1/-1->19->18 | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Trees [0] 23/-1/-1->22->21 [1] 23/-1/-1->22->21 [2] 23/-1/-1->22->21 [3] 23/-1/-1->22->21 [4] 23/-1/-1->22->21 [5] 23/-1/-1->22->21 [6] 23/-1/-1->22->21 [7] 23/-1/-1->22->21 | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Trees [0] -1/-1/-1->23->22 [1] -1/-1/-1->23->22 [2] 18/-1/-1->23->22 [3] 18/-1/-1->23->22 [4] -1/-1/-1->23->22 [5] -1/-1/-1->23->22 [6] 18/-1/-1->23->22 [7] 18/-1/-1->23->22 | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Trees [0] 68/-1/-1->67->66 [1] 68/-1/-1->67->66 [2] -1/-1/-1->67->66 [3] -1/-1/-1->67->66 [4] 68/78/-1->67->66 [5] 68/78/-1->67->66 [6] -1/-1/-1->67->66 [7] -1/-1/-1->67->66 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Trees [0] 67/-1/-1->66->60 [1] 67/-1/-1->66->60 [2] 67/-1/-1->66->71 [3] 67/-1/-1->66->71 [4] 67/54/-1->66->43 [5] 67/54/-1->66->43 [6] 67/-1/-1->66->71 [7] 67/-1/-1->66->71 | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Trees [0] -1/-1/-1->17->16 [1] -1/-1/-1->17->16 [2] 12/8/-1->17->16 [3] 12/8/-1->17->16 [4] -1/-1/-1->17->16 [5] -1/-1/-1->17->16 [6] 12/-1/-1->17->16 [7] 12/-1/-1->17->16 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Trees [0] 13/18/-1->12->25 [1] 13/18/-1->12->25 [2] 13/-1/-1->12->17 [3] 13/-1/-1->12->17 [4] 13/-1/-1->12->7 [5] 13/-1/-1->12->7 [6] 13/-1/-1->12->17 [7] 13/-1/-1->12->17 | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Trees [0] 16/-1/-1->15->14 [1] 16/-1/-1->15->14 [2] -1/-1/-1->15->14 [3] -1/-1/-1->15->14 [4] 16/-1/-1->15->14 [5] 16/-1/-1->15->14 [6] -1/-1/-1->15->14 [7] -1/-1/-1->15->14 | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Trees [0] 52/-1/-1->51->50 [1] 52/-1/-1->51->50 [2] -1/-1/-1->51->50 [3] -1/-1/-1->51->50 [4] 52/-1/-1->51->50 [5] 52/-1/-1->51->50 [6] -1/-1/-1->51->50 [7] -1/-1/-1->51->50 | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Trees [0] 51/-1/-1->50->49 [1] 51/-1/-1->50->49 [2] 51/-1/-1->50->49 [3] 51/-1/-1->50->49 [4] 51/-1/-1->50->49 [5] 51/-1/-1->50->49 [6] 51/-1/-1->50->49 [7] 51/-1/-1->50->49 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Trees [0] 49/72/-1->48->0 [1] 49/72/-1->48->0 [2] 49/-1/-1->48->53 [3] 49/-1/-1->48->53 [4] 49/-1/-1->48->54 [5] 49/-1/-1->48->54 [6] 49/-1/-1->48->53 [7] 49/-1/-1->48->53 | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Trees [0] -1/-1/-1->59->58 [1] -1/-1/-1->59->58 [2] 54/-1/-1->59->58 [3] 54/-1/-1->59->58 [4] -1/-1/-1->59->58 [5] -1/-1/-1->59->58 [6] 54/-1/-1->59->58 [7] 54/-1/-1->59->58 | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Trees [0] 59/-1/-1->58->57 [1] 59/-1/-1->58->57 [2] 59/-1/-1->58->57 [3] 59/-1/-1->58->57 [4] 59/-1/-1->58->57 [5] 59/-1/-1->58->57 [6] 59/-1/-1->58->57 [7] 59/-1/-1->58->57 | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Trees [0] 58/-1/-1->57->56 [1] 58/-1/-1->57->56 [2] 58/-1/-1->57->56 [3] 58/-1/-1->57->56 [4] 58/-1/-1->57->56 [5] 58/-1/-1->57->56 [6] 58/64/-1->57->56 [7] 58/64/-1->57->56 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Trees [0] 25/36/-1->24->49 [1] 25/36/-1->24->49 [2] 25/-1/-1->24->29 [3] 25/-1/-1->24->29 [4] 25/-1/-1->24->30 [5] 25/-1/-1->24->30 [6] 25/-1/-1->24->29 [7] 25/-1/-1->24->29 | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Trees [0] -1/-1/-1->29->28 [1] -1/-1/-1->29->28 [2] 24/16/-1->29->28 [3] 24/16/-1->29->28 [4] -1/-1/-1->29->28 [5] -1/-1/-1->29->28 [6] 24/-1/-1->29->28 [7] 24/-1/-1->29->28 | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Trees [0] 29/-1/-1->28->27 [1] 29/-1/-1->28->27 [2] 29/40/-1->28->53 [3] 29/40/-1->28->53 [4] 29/-1/-1->28->27 [5] 29/-1/-1->28->27 [6] 29/-1/-1->28->32 [7] 29/-1/-1->28->32 | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Trees [0] 63/-1/-1->62->61 [1] 63/-1/-1->62->61 [2] 63/-1/-1->62->61 [3] 63/-1/-1->62->61 [4] 63/-1/-1->62->61 [5] 63/-1/-1->62->61 [6] 63/-1/-1->62->61 [7] 63/-1/-1->62->61 | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Trees [0] 14/6/-1->13->12 [1] 14/6/-1->13->12 [2] 14/-1/-1->13->12 [3] 14/-1/-1->13->12 [4] 14/-1/-1->13->12 [5] 14/-1/-1->13->12 [6] 14/-1/-1->13->12 [7] 14/-1/-1->13->12 | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Trees [0] 50/24/-1->49->48 [1] 50/24/-1->49->48 [2] 50/-1/-1->49->48 [3] 50/-1/-1->49->48 [4] 50/-1/-1->49->48 [5] 50/-1/-1->49->48 [6] 50/-1/-1->49->48 [7] 50/-1/-1->49->48 | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Trees [0] -1/-1/-1->41->40 [1] -1/-1/-1->41->40 [2] 36/32/-1->41->40 [3] 36/32/-1->41->40 [4] -1/-1/-1->41->40 [5] -1/-1/-1->41->40 [6] 36/-1/-1->41->40 [7] 36/-1/-1->41->40 | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Trees [0] 56/-1/-1->55->54 [1] 56/-1/-1->55->54 [2] -1/-1/-1->55->54 [3] -1/-1/-1->55->54 [4] 56/60/-1->55->54 [5] 56/60/-1->55->54 [6] -1/-1/-1->55->54 [7] -1/-1/-1->55->54 | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Trees [0] 21/-1/-1->20->19 [1] 21/-1/-1->20->19 [2] 21/-1/-1->20->16 [3] 21/-1/-1->20->16 [4] 21/-1/-1->20->19 [5] 21/-1/-1->20->19 [6] 21/8/-1->20->44 [7] 21/8/-1->20->44 | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Trees [0] 22/-1/-1->21->20 [1] 22/-1/-1->21->20 [2] 22/-1/-1->21->20 [3] 22/-1/-1->21->20 [4] 22/-1/-1->21->20 [5] 22/-1/-1->21->20 [6] 22/32/-1->21->20 [7] 22/32/-1->21->20 | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Trees [0] 62/54/-1->61->60 [1] 62/54/-1->61->60 [2] 62/-1/-1->61->60 [3] 62/-1/-1->61->60 [4] 62/-1/-1->61->60 [5] 62/-1/-1->61->60 [6] 62/-1/-1->61->60 [7] 62/-1/-1->61->60 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Trees [0] 61/66/-1->60->73 [1] 61/66/-1->60->73 [2] 61/-1/-1->60->65 [3] 61/-1/-1->60->65 [4] 61/-1/-1->60->55 [5] 61/-1/-1->60->55 [6] 61/-1/-1->60->65 [7] 61/-1/-1->60->65 | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Trees [0] -1/-1/-1->35->34 [1] -1/-1/-1->35->34 [2] 30/-1/-1->35->34 [3] 30/-1/-1->35->34 [4] -1/-1/-1->35->34 [5] -1/-1/-1->35->34 [6] 30/-1/-1->35->34 [7] 30/-1/-1->35->34 | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Trees [0] 33/-1/-1->32->31 [1] 33/-1/-1->32->31 [2] 33/-1/-1->32->41 [3] 33/-1/-1->32->41 [4] 33/-1/-1->32->31 [5] 33/-1/-1->32->31 [6] 33/28/-1->32->21 [7] 33/28/-1->32->21 | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Trees [0] 35/-1/-1->34->33 [1] 35/-1/-1->34->33 [2] 35/-1/-1->34->33 [3] 35/-1/-1->34->33 [4] 35/-1/-1->34->33 [5] 35/-1/-1->34->33 [6] 35/-1/-1->34->33 [7] 35/-1/-1->34->33 | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Trees [0] 46/-1/-1->45->44 [1] 46/-1/-1->45->44 [2] 46/-1/-1->45->44 [3] 46/-1/-1->45->44 [4] 46/-1/-1->45->44 [5] 46/-1/-1->45->44 [6] 46/68/-1->45->44 [7] 46/68/-1->45->44 | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Trees [0] 40/-1/-1->39->38 [1] 40/-1/-1->39->38 [2] -1/-1/-1->39->38 [3] -1/-1/-1->39->38 [4] 40/-1/-1->39->38 [5] 40/-1/-1->39->38 [6] -1/-1/-1->39->38 [7] -1/-1/-1->39->38 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Trees [0] 55/-1/-1->54->61 [1] 55/-1/-1->54->61 [2] 55/-1/-1->54->59 [3] 55/-1/-1->54->59 [4] 55/48/-1->54->66 [5] 55/48/-1->54->66 [6] 55/-1/-1->54->59 [7] 55/-1/-1->54->59 | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Trees [0] 28/-1/-1->27->26 [1] 28/-1/-1->27->26 [2] -1/-1/-1->27->26 [3] -1/-1/-1->27->26 [4] 28/-1/-1->27->26 [5] 28/-1/-1->27->26 [6] -1/-1/-1->27->26 [7] -1/-1/-1->27->26 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Trees [0] 26/12/-1->25->24 [1] 26/12/-1->25->24 [2] 26/-1/-1->25->24 [3] 26/-1/-1->25->24 [4] 26/-1/-1->25->24 [5] 26/-1/-1->25->24 [6] 26/-1/-1->25->24 [7] 26/-1/-1->25->24 | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Trees [0] 27/-1/-1->26->25 [1] 27/-1/-1->26->25 [2] 27/-1/-1->26->25 [3] 27/-1/-1->26->25 [4] 27/-1/-1->26->25 [5] 27/-1/-1->26->25 [6] 27/-1/-1->26->25 [7] 27/-1/-1->26->25 | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Trees [0] 34/-1/-1->33->32 [1] 34/-1/-1->33->32 [2] 34/-1/-1->33->32 [3] 34/-1/-1->33->32 [4] 34/-1/-1->33->32 [5] 34/-1/-1->33->32 [6] 34/40/-1->33->32 [7] 34/40/-1->33->32 | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Trees [0] 31/-1/-1->30->37 [1] 31/-1/-1->30->37 [2] 31/-1/-1->30->35 [3] 31/-1/-1->30->35 [4] 31/24/-1->30->19 [5] 31/24/-1->30->19 [6] 31/-1/-1->30->35 [7] 31/-1/-1->30->35 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Trees [0] 45/-1/-1->44->43 [1] 45/-1/-1->44->43 [2] 45/-1/-1->44->40 [3] 45/-1/-1->44->40 [4] 45/-1/-1->44->43 [5] 45/-1/-1->44->43 [6] 45/20/-1->44->92 [7] 45/20/-1->44->92 | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Trees [0] 39/-1/-1->38->37 [1] 39/-1/-1->38->37 [2] 39/-1/-1->38->37 [3] 39/-1/-1->38->37 [4] 39/-1/-1->38->37 [5] 39/-1/-1->38->37 [6] 39/-1/-1->38->37 [7] 39/-1/-1->38->37 | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Trees [0] 32/-1/-1->31->30 [1] 32/-1/-1->31->30 [2] -1/-1/-1->31->30 [3] -1/-1/-1->31->30 [4] 32/36/-1->31->30 [5] 32/36/-1->31->30 [6] -1/-1/-1->31->30 [7] -1/-1/-1->31->30 | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Trees [0] 44/-1/-1->43->42 [1] 44/-1/-1->43->42 [2] -1/-1/-1->43->42 [3] -1/-1/-1->43->42 [4] 44/66/-1->43->42 [5] 44/66/-1->43->42 [6] -1/-1/-1->43->42 [7] -1/-1/-1->43->42 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Trees [0] 43/-1/-1->42->36 [1] 43/-1/-1->42->36 [2] 43/-1/-1->42->47 [3] 43/-1/-1->42->47 [4] 43/18/-1->42->90 [5] 43/18/-1->42->90 [6] 43/-1/-1->42->47 [7] 43/-1/-1->42->47 | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Trees [0] 38/30/-1->37->36 [1] 38/30/-1->37->36 [2] 38/-1/-1->37->36 [3] 38/-1/-1->37->36 [4] 38/-1/-1->37->36 [5] 38/-1/-1->37->36 [6] 38/-1/-1->37->36 [7] 38/-1/-1->37->36 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Trees [0] 37/42/-1->36->24 [1] 37/42/-1->36->24 [2] 37/-1/-1->36->41 [3] 37/-1/-1->36->41 [4] 37/-1/-1->36->31 [5] 37/-1/-1->36->31 [6] 37/-1/-1->36->41 [7] 37/-1/-1->36->41 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 02 : 72[101c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 02 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 02 : 24[101c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 02 : 48[101c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30688 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29518:30122 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 02/0 : 9[101d0] -> 16[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 02/0 : 57[101d0] -> 64[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30085:30688 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29518:30122 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 03/0 : 57[101d0] -> 64[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 03/0 : 9[101d0] -> 16[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27723:28289 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30085:30688 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29518:30122 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 02/0 : 81[101d0] -> 88[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 02 : 84[901c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 06/0 : 9[101d0] -> 16[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 06/0 : 57[101d0] -> 64[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27723:28289 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30085:30688 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 03/0 : 81[101d0] -> 88[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-54:29518:30122 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 07/0 : 9[101d0] -> 16[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 07/0 : 57[101d0] -> 64[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 02 : 86[a01c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28289 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 06/0 : 81[101d0] -> 88[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 03 : 72[101c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28289 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 07/0 : 81[101d0] -> 88[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 03 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27563 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 02/0 : 53[901d0] -> 56[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 03 : 24[101c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27563 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28455 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 02 : 14[a01c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 03/0 : 53[901d0] -> 56[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 02/0 : 77[901d0] -> 80[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26988:27563 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28455 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 03/0 : 77[901d0] -> 80[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 06/0 : 53[901d0] -> 56[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27793:28455 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26988:27563 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 06/0 : 77[901d0] -> 80[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 07/0 : 53[901d0] -> 56[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 03 : 48[101c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28455 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29215:29783 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 07/0 : 77[901d0] -> 80[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30137:30701 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 02 : 2[201c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 02/0 : 33[101d0] -> 40[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 02/0 : 5[901d0] -> 8[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-50:29215:29783 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30137:30701 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 03/0 : 33[101d0] -> 40[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 03/0 : 5[901d0] -> 8[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-50:29215:29783 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30137:30701 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 06/0 : 33[101d0] -> 40[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 06/0 : 5[901d0] -> 8[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-50:29215:29783 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30137:30701 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 07/0 : 33[101d0] -> 40[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 07/0 : 5[901d0] -> 8[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 03 : 84[901c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 06 : 24[101c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 02 : 12[901c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 03 : 86[a01c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26346 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 06 : 72[101c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 02/0 : 29[901d0] -> 32[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25778:26346 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 03/0 : 29[901d0] -> 32[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-49:25778:26346 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 03 : 14[a01c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 06 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 06/0 : 29[901d0] -> 32[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 06 : 48[101c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 02 : 26[201c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26346 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 02 : 50[201c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 07/0 : 29[901d0] -> 32[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 02 : 38[a01c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 02 : 62[a01c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 02 : 74[201c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 06 : 84[901c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 03 : 2[201c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 02 : 60[901c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 00/0 : 7[a01d0] -> 12[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 06 : 86[a01c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 01/0 : 7[a01d0] -> 12[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 04/0 : 7[a01d0] -> 12[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 05/0 : 7[a01d0] -> 12[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-56:28175:29019 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 02/0 : 69[901d0] -> 76[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 03 : 12[901c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 07 : 72[101c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30697 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28175:29019 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 03/0 : 69[901d0] -> 76[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 00/0 : 1[101d0] -> 6[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28175:29019 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 06/0 : 69[901d0] -> 76[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 06 : 14[a01c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30697 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 03 : 62[a01c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:29019 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 01/0 : 1[101d0] -> 6[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39401:39973 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 07/0 : 69[901d0] -> 76[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 00/0 : 85[901d0] -> 90[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30135:30697 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 02/0 : 45[901d0] -> 52[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 07 : 0[101c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 01/0 : 85[901d0] -> 90[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 04/0 : 1[101d0] -> 6[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39401:39973 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 04/0 : 85[901d0] -> 90[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30135:30697 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 07 : 48[101c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 03/0 : 45[901d0] -> 52[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-59:27918:28693 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 05/0 : 1[101d0] -> 6[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 00/0 : 1[101d0] -> 6[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 00/0 : 61[901d0] -> 66[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 03 : 50[201c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 00/0 : 85[901d0] -> 90[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 05/0 : 85[901d0] -> 90[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 00 : 6[a01c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 07 : 24[101c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 01/0 : 1[101d0] -> 6[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 01/0 : 61[901d0] -> 66[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39401:39973 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27918:28693 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 06/0 : 45[901d0] -> 52[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 07 : 84[901c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 01/0 : 85[901d0] -> 90[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 04/0 : 1[101d0] -> 6[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 04/0 : 61[901d0] -> 66[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39401:39973 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 05/0 : 1[101d0] -> 6[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 07/0 : 45[901d0] -> 52[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-59:27918:28693 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 05/0 : 61[901d0] -> 66[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 04/0 : 85[901d0] -> 90[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 03 : 74[201c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 03 : 38[a01c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28693 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 07 : 86[a01c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 05/0 : 85[901d0] -> 90[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 00 : 90[201c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 06 : 2[201c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 03 : 60[901c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 03 : 26[201c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 02/0 : 81[101d0] -> 88[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 06 : 12[901c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 03/0 : 81[101d0] -> 88[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 06/0 : 81[101d0] -> 88[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 07/0 : 81[101d0] -> 88[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 01 : 6[a01c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 02/0 : 9[101d0] -> 16[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 06 : 62[a01c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 07 : 14[a01c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 03/0 : 9[101d0] -> 16[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-59:27920:28621 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 06/0 : 9[101d0] -> 16[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 02/0 : 89[101d0] -> 92[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 07/0 : 9[101d0] -> 16[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-59:27920:28621 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 03/0 : 89[101d0] -> 92[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-59:27920:28621 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 06 : 50[201c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 06/0 : 89[101d0] -> 92[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 01 : 90[201c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28621 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 06 : 74[201c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 07/0 : 89[101d0] -> 92[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 00/0 : 91[201d0] -> 0[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 01/0 : 91[201d0] -> 0[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 02 : 36[901c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 07 : 2[201c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 04/0 : 91[201d0] -> 0[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 00/0 : 43[201d0] -> 48[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 05/0 : 91[201d0] -> 0[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 01/0 : 43[201d0] -> 48[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 06 : 60[901c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 04/0 : 43[201d0] -> 48[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 05/0 : 43[201d0] -> 48[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 06 : 26[201c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 02 : 6[a01c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 00/0 : 73[101d0] -> 78[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-44:28069:28963 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 00/0 : 13[901d0] -> 18[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 02/0 : 93[901d0] -> 4[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 01/0 : 73[101d0] -> 78[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 07 : 12[901c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 01/0 : 13[901d0] -> 18[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 04/0 : 73[101d0] -> 78[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-44:28069:28963 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 03/0 : 93[901d0] -> 4[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 07 : 62[a01c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 04/0 : 13[901d0] -> 18[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 05/0 : 73[101d0] -> 78[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 05/0 : 13[901d0] -> 18[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-44:28069:28963 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 06/0 : 93[901d0] -> 4[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 07 : 50[201c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 02/0 : 93[901d0] -> 4[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 03/0 : 93[901d0] -> 4[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-44:28069:28963 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 02 : 90[201c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 07 : 74[201c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 07/0 : 93[901d0] -> 4[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 07 : 60[901c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 06 : 38[a01c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 06/0 : 93[901d0] -> 4[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 07/0 : 93[901d0] -> 4[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 00/0 : 49[101d0] -> 54[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 07 : 26[201c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 01/0 : 49[101d0] -> 54[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 03 : 6[a01c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 04/0 : 49[101d0] -> 54[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 05/0 : 49[101d0] -> 54[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 00/0 : 55[a01d0] -> 60[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30344:30902 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 01/0 : 55[a01d0] -> 60[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 02/0 : 41[101d0] -> 44[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30342:30903 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 04/0 : 55[a01d0] -> 60[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 00/0 : 25[101d0] -> 30[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 00/0 : 37[901d0] -> 42[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 05/0 : 55[a01d0] -> 60[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30195:30738 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 02/0 : 21[901d0] -> 28[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30344:30902 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 03/0 : 41[101d0] -> 44[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 01/0 : 25[101d0] -> 30[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 04/0 : 25[101d0] -> 30[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30342:30903 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 01/0 : 37[901d0] -> 42[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30195:30738 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 03/0 : 21[901d0] -> 28[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 05/0 : 25[101d0] -> 30[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30344:30902 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 06/0 : 41[101d0] -> 44[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30342:30903 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 04/0 : 37[901d0] -> 42[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30195:30738 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 03 : 90[201c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 06/0 : 21[901d0] -> 28[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 03 : 36[901c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30760 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30344:30902 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 07/0 : 41[101d0] -> 44[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30195:30738 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30342:30903 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 00/0 : 13[901d0] -> 18[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 05/0 : 37[901d0] -> 42[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30188:30760 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 07/0 : 21[901d0] -> 28[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 00 : 42[201c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 01/0 : 13[901d0] -> 18[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 02/0 : 45[901d0] -> 52[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30188:30760 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 04/0 : 13[901d0] -> 18[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 03/0 : 45[901d0] -> 52[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 02/0 : 57[101d0] -> 64[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30188:30760 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 06/0 : 45[901d0] -> 52[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 05/0 : 13[901d0] -> 18[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 04 : 6[a01c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 03/0 : 57[101d0] -> 64[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 00 : 18[201c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 07/0 : 45[901d0] -> 52[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 06/0 : 57[101d0] -> 64[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 07/0 : 57[101d0] -> 64[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 00/0 : 37[901d0] -> 42[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-53:26986:27559 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 00/0 : 49[101d0] -> 54[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 01/0 : 37[901d0] -> 42[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26986:27559 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 04/0 : 37[901d0] -> 42[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 01/0 : 49[101d0] -> 54[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 05/0 : 37[901d0] -> 42[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26986:27559 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 04/0 : 49[101d0] -> 54[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 07 : 38[a01c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27559 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 04 : 90[201c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 05/0 : 49[101d0] -> 54[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 00 : 54[a01c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26344 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 00 : 10[201c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 00/0 : 25[101d0] -> 30[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30190:30756 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 01 : 42[201c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26344 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 02/0 : 17[101d0] -> 20[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 05 : 6[a01c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 02/0 : 33[101d0] -> 40[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 01/0 : 25[101d0] -> 30[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30190:30756 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 03/0 : 17[101d0] -> 20[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 02/0 : 65[101d0] -> 68[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 03/0 : 33[101d0] -> 40[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 00/0 : 79[a01d0] -> 84[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30190:30756 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 03/0 : 65[101d0] -> 68[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-49:25776:26344 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 04/0 : 25[101d0] -> 30[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 06/0 : 33[101d0] -> 40[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 01/0 : 79[a01d0] -> 84[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 06/0 : 17[101d0] -> 20[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 06/0 : 65[101d0] -> 68[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 00 : 46[a01c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 07/0 : 33[101d0] -> 40[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-49:25776:26344 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 05/0 : 25[101d0] -> 30[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 04/0 : 79[a01d0] -> 84[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30190:30756 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 07/0 : 65[101d0] -> 68[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 07/0 : 17[101d0] -> 20[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 00 : 30[a01c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 05/0 : 79[a01d0] -> 84[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 00/0 : 31[a01d0] -> 36[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 02/0 : 77[901d0] -> 80[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 01/0 : 31[a01d0] -> 36[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 01 : 10[201c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 02/0 : 89[101d0] -> 92[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 04/0 : 31[a01d0] -> 36[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 03/0 : 89[101d0] -> 92[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 00/0 : 19[201d0] -> 24[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 05/0 : 31[a01d0] -> 36[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-57:27791:28450 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 06/0 : 89[101d0] -> 92[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 01/0 : 19[201d0] -> 24[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 06 : 6[a01c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 05 : 90[201c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 06 : 36[901c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 01 : 18[201c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 00/0 : 73[101d0] -> 78[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 07/0 : 89[101d0] -> 92[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 01 : 54[a01c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 04/0 : 19[201d0] -> 24[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 02/0 : 69[901d0] -> 76[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27791:28450 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 03/0 : 69[901d0] -> 76[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 05/0 : 19[201d0] -> 24[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 01/0 : 73[101d0] -> 78[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 06/0 : 69[901d0] -> 76[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27791:28450 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 07/0 : 69[901d0] -> 76[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 02 : 42[201c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 00 : 58[201c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 04/0 : 73[101d0] -> 78[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27791:28450 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 05/0 : 73[101d0] -> 78[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 01 : 46[a01c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 00 : 78[a01c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 02/0 : 21[901d0] -> 28[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 02 : 10[201c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 03/0 : 21[901d0] -> 28[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 02/0 : 17[101d0] -> 20[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 06/0 : 21[901d0] -> 28[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 02/0 : 5[901d0] -> 8[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 07 : 6[a01c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 07/0 : 21[901d0] -> 28[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29154:29703 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 06 : 90[201c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 03/0 : 5[901d0] -> 8[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 00/0 : 61[901d0] -> 66[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 06/0 : 5[901d0] -> 8[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29154:29703 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 01/0 : 61[901d0] -> 66[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 00 : 94[a01c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 07/0 : 5[901d0] -> 8[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29154:29703 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 03 : 42[201c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 04/0 : 61[901d0] -> 66[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 02 : 54[a01c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29703 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 02 : 46[a01c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 05/0 : 61[901d0] -> 66[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 00/0 : 67[201d0] -> 72[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 03 : 10[201c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 01 : 30[a01c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 00 : 66[201c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 01 : 58[201c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 02/0 : 29[901d0] -> 32[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 01/0 : 67[201d0] -> 72[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 07 : 90[201c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 04/0 : 67[201d0] -> 72[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-55:29156:29706 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 02/0 : 65[101d0] -> 68[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 05/0 : 67[201d0] -> 72[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 02/0 : 53[901d0] -> 56[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29156:29706 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 03/0 : 65[101d0] -> 68[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 03/0 : 53[901d0] -> 56[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29156:29706 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 06/0 : 53[901d0] -> 56[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 02 : 18[201c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 06/0 : 65[101d0] -> 68[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 07/0 : 53[901d0] -> 56[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 01 : 94[a01c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 04 : 42[201c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 07 : 36[901c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 01 : 78[a01c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29706 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 07/0 : 65[101d0] -> 68[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 03 : 54[a01c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 03 : 46[a01c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 02 : 58[201c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 02 : 94[a01c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 05 : 42[201c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 04 : 54[a01c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 04 : 46[a01c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 03 : 58[201c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 01 : 66[201c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 02 : 30[a01c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 03 : 18[201c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 06 : 42[201c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 05 : 54[a01c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 04 : 10[201c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 05 : 46[a01c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 04 : 58[201c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 02 : 78[a01c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 00 : 82[201c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 07 : 42[201c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 06 : 54[a01c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 06 : 46[a01c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 00 : 75[201d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 05 : 58[201c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 02 : 66[201c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 01 : 75[201d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 03 : 30[a01c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 04 : 18[201c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 03 : 94[a01c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 07 : 46[a01c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 07 : 54[a01c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 02 : 75[201d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 02/0 : 41[101d0] -> 44[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 03 : 78[a01c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 01 : 82[201c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 03 : 75[201d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 06 : 58[201c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 00 : 15[a01d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 00 : 27[201d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:29023 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 00/0 : 67[201d0] -> 72[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 03 : 66[201c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28283 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 04 : 75[201d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 00/0 : 79[a01d0] -> 84[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 01 : 15[a01d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 01 : 27[201d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 05 : 18[201c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 04 : 78[a01c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 00 : 70[a01c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 05 : 75[201d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 02 : 82[201c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 04 : 30[a01c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 02 : 15[a01d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30686 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 02 : 27[201d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 00 : 22[a01c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 00/0 : 7[a01d0] -> 12[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 07 : 58[201c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 06 : 75[201d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 00 : 34[201c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 03 : 15[a01d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 03 : 27[201d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 05 : 78[a01c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30741 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 00/0 : 19[201d0] -> 24[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 03 : 82[201c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39974 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 00 : 87[a01d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 04 : 94[a01c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 07 : 75[201d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 04 : 15[a01d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 00/0 : 43[201d0] -> 48[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 04 : 27[201d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 04 : 66[201c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30119 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 00/0 : 55[a01d0] -> 60[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 06 : 18[201c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 00 : 51[201d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28962 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29514:30119 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 00/0 : 91[201d0] -> 0[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 01/0 : 55[a01d0] -> 60[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29514:30119 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 04/0 : 55[a01d0] -> 60[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 06 : 78[a01c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 05 : 15[a01d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28962 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 01/0 : 91[201d0] -> 0[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 01 : 70[a01c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 05 : 27[201d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30119 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28065:28962 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 05 : 30[a01c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 04/0 : 91[201d0] -> 0[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 04 : 82[201c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 05/0 : 55[a01d0] -> 60[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 00 : 60[901c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28962 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 05/0 : 91[201d0] -> 0[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 00 : 63[a01d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 00 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 00 : 3[201d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 06 : 15[a01d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 01 : 22[a01c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 06 : 27[201d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 01 : 34[201c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 05 : 10[201c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 07 : 78[a01c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 05 : 82[201c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 05 : 66[201c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 07 : 15[a01d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 07 : 27[201d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 01 : 51[201d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 07 : 18[201c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 06 : 30[a01c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 01 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 01 : 3[201d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 02 : 22[a01c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 06 : 66[201c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 02 : 34[201c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 02 : 51[201d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 06 : 82[201c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 04 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 07 : 30[a01c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 02 : 3[201d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 07 : 66[201c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 03 : 51[201d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 06 : 10[201c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 03 : 34[201c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 01 : 87[a01d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 05 : 94[a01c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 01 : 60[901c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 01 : 63[a01d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 04 : 60[901c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 02 : 63[a01d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 05 : 60[901c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 03 : 22[a01c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 03 : 63[a01d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 04 : 51[201d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 05 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 03 : 3[201d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 05 : 51[201d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 04 : 63[a01d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 02 : 87[a01d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 06 : 51[201d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 04 : 34[201c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 02 : 70[a01c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 05 : 63[a01d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 00 : 50[201c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 03 : 87[a01d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 00 : 62[a01c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 07 : 10[201c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29780 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 00/0 : 31[a01d0] -> 36[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 07 : 51[201d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29780 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 01/0 : 31[a01d0] -> 36[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 04 : 3[201d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29780 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 04/0 : 31[a01d0] -> 36[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29211:29780 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 04 : 22[a01c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 06 : 63[a01d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 05/0 : 31[a01d0] -> 36[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 00 : 36[901c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 01 : 50[201c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 04 : 87[a01d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 01 : 62[a01c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 07 : 82[201c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 05 : 3[201d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 04 : 50[201c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 07 : 63[a01d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 00 : 39[a01d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 05 : 87[a01d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 04 : 62[a01c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 00 : 2[201c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 05 : 50[201c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 00 : 86[a01c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 06 : 3[201d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 01 : 36[901c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 06 : 87[a01d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 05 : 62[a01c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 05 : 34[201c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 01 : 2[201c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 01 : 86[a01c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 00 : 47[a01d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 01 : 39[a01d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 07 : 3[201d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 07 : 87[a01d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 03 : 70[a01c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 01 : 47[a01d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 04 : 36[901c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 04 : 2[201c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 04 : 86[a01c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 02 : 47[a01d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 00 : 11[201d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 05 : 22[a01c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 02 : 39[a01d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 05 : 2[201c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 05 : 86[a01c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 03 : 47[a01d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 05 : 36[901c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 01 : 11[201d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 04 : 47[a01d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 02 : 11[201d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 03 : 39[a01d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 05 : 47[a01d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 03 : 11[201d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 04 : 70[a01c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 06 : 34[201c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 06 : 47[a01d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 04 : 11[201d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 07 : 47[a01d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 04 : 39[a01d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 05 : 11[201d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 06 : 22[a01c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 06 : 11[201d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 05 : 39[a01d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 07 : 11[201d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 06 : 39[a01d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 05 : 70[a01c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 07 : 34[201c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 07 : 39[a01d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 07 : 22[a01c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 06 : 94[a01c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 06 : 70[a01c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 00 : 23[a01d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 00 : 59[201d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 01 : 59[201d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 02 : 59[201d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 03 : 59[201d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 04 : 59[201d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 05 : 59[201d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 06 : 59[201d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 07 : 59[201d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 07 : 94[a01c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 00 : 71[a01d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 01 : 23[a01d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 00 : 95[a01d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 00 : 35[201d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 07 : 70[a01c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 02 : 23[a01d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 01 : 95[a01d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 01 : 71[a01d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 03 : 23[a01d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 01 : 35[201d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 02 : 95[a01d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 02 : 71[a01d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 04 : 23[a01d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 00 : 83[201d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 03 : 95[a01d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 02 : 35[201d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 03 : 71[a01d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 05 : 23[a01d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 04 : 95[a01d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 03 : 35[201d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 04 : 71[a01d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 01 : 83[201d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 04 : 35[201d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 06 : 23[a01d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 05 : 71[a01d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 02 : 83[201d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 05 : 35[201d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 05 : 95[a01d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 06 : 35[201d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 06 : 71[a01d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 07 : 35[201d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 07 : 71[a01d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 03 : 83[201d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 04 : 83[201d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 05 : 83[201d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 06 : 83[201d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 07 : 23[a01d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 06 : 95[a01d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 07 : 83[201d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 07 : 95[a01d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 03/0 : 77[901d0] -> 80[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 06/0 : 77[901d0] -> 80[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28171:29023 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 01/0 : 67[201d0] -> 72[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 07/0 : 77[901d0] -> 80[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-56:28171:29023 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 04/0 : 67[201d0] -> 72[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28171:29023 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 05/0 : 67[201d0] -> 72[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 00 : 72[101c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 01 : 72[101c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 03/0 : 17[101d0] -> 20[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 04 : 72[101c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30686 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 01/0 : 7[a01d0] -> 12[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 06/0 : 17[101d0] -> 20[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 07/0 : 17[101d0] -> 20[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30081:30686 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 04/0 : 7[a01d0] -> 12[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 03/0 : 29[901d0] -> 32[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30081:30686 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 06/0 : 29[901d0] -> 32[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30191:30741 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 01/0 : 19[201d0] -> 24[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 05/0 : 7[a01d0] -> 12[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 00 : 12[901c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 07/0 : 29[901d0] -> 32[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 05 : 72[101c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30741 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 04/0 : 19[201d0] -> 24[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30191:30741 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 05/0 : 19[201d0] -> 24[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 00 : 24[101c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39974 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 01/0 : 43[201d0] -> 48[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 01 : 12[901c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39974 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 04/0 : 43[201d0] -> 48[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39397:39974 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 01 : 24[101c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 05/0 : 43[201d0] -> 48[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 00 : 48[101c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 01 : 48[101c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28283 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 01/0 : 79[a01d0] -> 84[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-58:27719:28283 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 04/0 : 79[a01d0] -> 84[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-58:27719:28283 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 05/0 : 79[a01d0] -> 84[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 00 : 84[901c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 04 : 12[901c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 04 : 48[101c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 04 : 24[101c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 03/0 : 41[101d0] -> 44[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 01 : 84[901c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 06/0 : 41[101d0] -> 44[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 05 : 48[101c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 07/0 : 41[101d0] -> 44[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 04 : 84[901c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 05 : 84[901c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 05 : 24[101c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 05 : 12[901c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 00 : 14[a01c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 01 : 14[a01c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 04 : 14[a01c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 05 : 14[a01c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 00 : 26[201c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 01 : 26[201c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 04 : 26[201c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 05 : 26[201c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 00 : 38[a01c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 01 : 38[a01c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 04 : 38[a01c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 05 : 38[a01c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 00 : 74[201c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 01 : 74[201c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 04 : 74[201c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 05 : 74[201c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 00 : 9[101d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 01 : 9[101d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 04 : 9[101d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 05 : 9[101d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 00 : 5[901d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 01 : 5[901d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 04 : 5[901d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 05 : 5[901d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 02 : 52[901c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 03 : 52[901c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 06 : 52[901c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 07 : 52[901c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 00 : 45[901d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 01 : 45[901d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 04 : 45[901d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 05 : 45[901d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 00 : 10[201c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 01 : 10[201c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 02 : 10[201c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 00 : 81[101d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 03 : 10[201c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 04 : 10[201c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 05 : 10[201c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 06 : 10[201c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 02 : 16[101c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Channel 07 : 10[201c0] -> 11[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 03 : 16[101c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 06 : 16[101c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 01 : 81[101d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 07 : 16[101c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 04 : 81[101d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 05 : 81[101d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 00 : 57[101d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 01 : 57[101d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 02 : 11[201d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 04 : 57[101d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 03 : 11[201d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 05 : 57[101d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 06 : 11[201d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Channel 07 : 11[201d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 00 : 46[a01c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 00 : 92[901c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 01 : 46[a01c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 02 : 46[a01c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 01 : 92[901c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 03 : 46[a01c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 04 : 46[a01c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 05 : 46[a01c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 02 : 92[901c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 02 : 88[101c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 06 : 46[a01c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 00 : 53[901d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Channel 07 : 46[a01c0] -> 47[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 01 : 53[901d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 03 : 92[901c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 03 : 88[101c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 04 : 53[901d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 05 : 53[901d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 04 : 92[901c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 06 : 88[101c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 05 : 92[901c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 06 : 92[901c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 07 : 88[101c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 07 : 92[901c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 02 : 47[a01d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 03 : 47[a01d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 06 : 47[a01d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Channel 07 : 47[a01d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 02 : 4[901c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 03 : 4[901c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 00 : 82[201c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 06 : 4[901c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 07 : 4[901c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 01 : 82[201c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 00 : 16[101c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 02 : 13[901d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 02 : 82[201c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 01 : 16[101c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 03 : 13[901d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 03 : 82[201c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 04 : 16[101c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 06 : 13[901d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 05 : 16[101c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 00 : 33[101d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 07 : 13[901d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 04 : 82[201c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 01 : 33[101d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 05 : 82[201c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 04 : 33[101d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 06 : 82[201c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 00 : 89[101d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 05 : 33[101d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 02 : 91[201d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 03 : 91[201d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 06 : 91[201d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 01 : 89[101d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 00 : 93[901d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Channel 07 : 82[201c0] -> 83[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 07 : 91[201d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 00 : 56[101c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 01 : 93[901d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 02 : 64[101c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 01 : 56[101c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 04 : 89[101d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 04 : 93[901d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 03 : 64[101c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 02 : 56[101c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 06 : 64[101c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 05 : 93[901d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 05 : 89[101d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 03 : 56[101c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 07 : 64[101c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 04 : 56[101c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 02 : 83[201d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 00 : 88[101c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 05 : 56[101c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 03 : 83[201d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 01 : 88[101c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 06 : 83[201d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 06 : 56[101c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 04 : 88[101c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 00 : 68[901c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Channel 07 : 83[201d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 07 : 56[101c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 05 : 88[101c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 01 : 68[901c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 00 : 58[201c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 01 : 58[201c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 02 : 68[901c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 02 : 58[201c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 03 : 68[901c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 02 : 67[201d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 04 : 68[901c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 03 : 67[201d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 05 : 68[901c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 02 : 85[901d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 06 : 67[201d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 02 : 49[101d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 03 : 58[201c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 00 : 4[901c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 00 : 41[101d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 02 : 1[101d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 03 : 49[101d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 01 : 4[901c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 03 : 1[101d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 00 : 65[101d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 04 : 4[901c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 01 : 41[101d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 06 : 49[101d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 06 : 1[101d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 04 : 41[101d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 02 : 55[a01d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 05 : 4[901c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 07 : 49[101d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 01 : 65[101d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 07 : 1[101d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 06 : 68[901c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 05 : 41[101d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 00 : 64[101c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 03 : 85[901d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 02 : 61[901d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 07 : 67[201d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 04 : 58[201c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 04 : 65[101d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 03 : 61[901d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 01 : 64[101c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 07 : 68[901c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 03 : 55[a01d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 05 : 65[101d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 06 : 61[901d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 04 : 64[101c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 05 : 58[201c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 06 : 55[a01d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 05 : 64[101c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 02 : 40[101c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 06 : 85[901d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 06 : 58[201c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 03 : 40[101c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 06 : 40[101c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 07 : 55[a01d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 07 : 85[901d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 02 : 19[201d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 07 : 61[901d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 07 : 40[101c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Channel 07 : 58[201c0] -> 59[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 03 : 19[201d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 00 : 52[901c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 06 : 19[201d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 00 : 66[201c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 00 : 90[201c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 01 : 52[901c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 07 : 19[201d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 01 : 66[201c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 04 : 52[901c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 05 : 52[901c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 00 : 15[a01d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 02 : 66[201c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 01 : 15[a01d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 00 : 34[201c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 03 : 66[201c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 04 : 15[a01d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 01 : 34[201c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Channel 05 : 15[a01d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 02 : 31[a01d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 02 : 34[201c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 04 : 66[201c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 00 : 14[a01c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 03 : 31[a01d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 03 : 34[201c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 00 : 18[201c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 01 : 14[a01c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 06 : 31[a01d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 04 : 34[201c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 07 : 31[a01d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 00 : 40[101c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 05 : 34[201c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 02 : 14[a01c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 01 : 18[201c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 06 : 34[201c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 01 : 40[101c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 02 : 76[901c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 04 : 40[101c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Channel 07 : 34[201c0] -> 35[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 05 : 66[201c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 00 : 77[901d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 05 : 40[101c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 03 : 76[901c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 01 : 77[901d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 06 : 66[201c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 06 : 76[901c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 01 : 90[201c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 04 : 77[901d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 07 : 76[901c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 05 : 77[901d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 07 : 66[201c0] -> 67[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 00 : 88[101c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 00 : 8[101c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 03 : 14[a01c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 01 : 8[101c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 02 : 8[101c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 00 : 17[101d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 01 : 88[101c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 03 : 8[101c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 04 : 14[a01c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 04 : 8[101c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 02 : 37[901d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 00 : 54[a01c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 00 : 69[901d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 01 : 17[101d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 02 : 90[201c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 05 : 8[101c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 05 : 14[a01c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 06 : 8[101c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 04 : 17[101d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 00 : 30[a01c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 01 : 69[901d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 07 : 8[101c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 06 : 14[a01c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 02 : 88[101c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 05 : 17[101d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 02 : 18[201c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 00 : 67[201d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 07 : 14[a01c0] -> 15[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 04 : 69[901d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 03 : 18[201c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 01 : 67[201d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 03 : 88[101c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 04 : 18[201c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 03 : 37[901d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 05 : 69[901d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 01 : 54[a01c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 02 : 43[201d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 05 : 18[201c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 04 : 67[201d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 03 : 90[201c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 06 : 18[201c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 03 : 43[201d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 06 : 43[201d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 07 : 18[201c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 00 : 29[901d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 02 : 73[101d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 05 : 67[201d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 04 : 88[101c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 07 : 43[201d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 00 : 64[101c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 00 : 76[901c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 06 : 37[901d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 01 : 29[901d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 03 : 73[101d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 04 : 29[901d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 01 : 76[901c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 00 : 20[901c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 06 : 73[101d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 05 : 29[901d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 04 : 76[901c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 01 : 64[101c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 07 : 73[101d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 01 : 30[a01c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 05 : 88[101c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 04 : 90[201c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 05 : 76[901c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 01 : 20[901c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 02 : 64[101c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 00 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 06 : 88[101c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 07 : 37[901d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 00 : 44[901c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 02 : 20[901c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 00 : 80[101c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 03 : 64[101c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 02 : 30[a01c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 02 : 28[901c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 00 : 42[201c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 00 : 87[a01d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 01 : 44[901c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 01 : 80[101c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 03 : 20[901c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 07 : 88[101c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 02 : 59[201d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 01 : 42[201c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 02 : 54[a01c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 04 : 64[101c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 02 : 44[901c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 02 : 80[101c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 05 : 90[201c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 01 : 87[a01d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 04 : 20[901c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 02 : 42[201c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 05 : 64[101c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 03 : 44[901c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 03 : 30[a01c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 03 : 80[101c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 03 : 42[201c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 00 : 50[201c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 04 : 44[901c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 06 : 64[101c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 05 : 20[901c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 04 : 80[101c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 04 : 42[201c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 05 : 44[901c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 07 : 64[101c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 05 : 80[101c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 00 : 9[101d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 05 : 42[201c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 06 : 20[901c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 03 : 54[a01c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 03 : 59[201d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 01 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 04 : 30[a01c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 06 : 44[901c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 01 : 9[101d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 06 : 80[101c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 06 : 42[201c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 06 : 90[201c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 02 : 35[201d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 01 : 50[201c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 02 : 9[101d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 07 : 44[901c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 00 : 62[a01c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 00 : 32[101c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 07 : 20[901c0] -> 19[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 03 : 9[101d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 07 : 80[101c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 07 : 42[201c0] -> 43[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 04 : 9[101d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 05 : 9[101d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 02 : 50[201c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 01 : 32[101c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 00 : 63[a01d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 06 : 9[101d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 07 : 9[101d0] -> 10[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 04 : 87[a01d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Channel 05 : 87[a01d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 04 : 54[a01c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 02 : 32[101c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 06 : 59[201d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 07 : 90[201c0] -> 91[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 00 : 52[901c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 00 : 16[101c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 03 : 28[901c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 03 : 32[101c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 01 : 63[a01d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 03 : 50[201c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 03 : 35[201d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 01 : 62[a01c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 01 : 16[101c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 06 : 28[901c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 05 : 30[a01c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 02 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 02 : 16[101c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 07 : 28[901c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 01 : 52[901c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 04 : 32[101c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 03 : 16[101c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 04 : 50[201c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 04 : 63[a01d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 06 : 35[201d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 04 : 16[101c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 02 : 62[a01c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 06 : 30[a01c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 05 : 16[101c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 02 : 52[901c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 00 : 92[901c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 05 : 32[101c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 06 : 16[101c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 05 : 50[201c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Channel 07 : 59[201d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 00 : 21[901d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 05 : 54[a01c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 07 : 16[101c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Channel 07 : 35[201d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Channel 05 : 63[a01d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 03 : 62[a01c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 07 : 30[a01c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 03 : 52[901c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 06 : 50[201c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 06 : 32[101c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 03 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 01 : 92[901c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 01 : 21[901d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 06 : 54[a01c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 04 : 52[901c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 07 : 32[101c0] -> 31[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 00 : 91[201d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 07 : 50[201c0] -> 51[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 04 : 62[a01c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 00 : 86[a01c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 00 : 57[101d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 02 : 92[901c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 05 : 52[901c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 04 : 21[901d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 07 : 54[a01c0] -> 55[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 00 : 40[101c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 01 : 91[201d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 01 : 57[101d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 05 : 62[a01c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 04 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 06 : 52[901c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 05 : 21[901d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 03 : 92[901c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 00 : 56[101c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 00 : 51[201d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 01 : 86[a01c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 02 : 57[101d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 07 : 52[901c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 06 : 62[a01c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 04 : 91[201d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 00 : 28[901c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 01 : 56[101c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 01 : 51[201d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 03 : 57[101d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 02 : 25[101d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 01 : 28[901c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 04 : 92[901c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 00 : 93[901d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 05 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 03 : 25[101d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 02 : 56[101c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 04 : 57[101d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 04 : 28[901c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 04 : 51[201d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 02 : 86[a01c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 05 : 91[201d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 06 : 25[101d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 05 : 28[901c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 03 : 56[101c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 07 : 25[101d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 05 : 57[101d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 00 : 4[901c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 05 : 92[901c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 01 : 93[901d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 07 : 62[a01c0] -> 63[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 06 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 04 : 56[101c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 06 : 57[101d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 00 : 55[a01d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 03 : 86[a01c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 05 : 56[101c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 07 : 57[101d0] -> 58[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 06 : 92[901c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 02 : 93[901d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Channel 05 : 51[201d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 01 : 4[901c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 01 : 55[a01d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 07 : 0[101c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 06 : 56[101c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 00 : 94[a01c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 04 : 55[a01d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 04 : 86[a01c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 01 : 40[101c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 07 : 56[101c0] -> 57[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 02 : 4[901c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 07 : 92[901c0] -> 93[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 03 : 93[901d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 02 : 79[a01d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 01 : 94[a01c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 03 : 4[901c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 03 : 79[a01d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 02 : 40[101c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 04 : 93[901d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 00 : 48[101c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 04 : 4[901c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 06 : 79[a01d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 02 : 94[a01c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 00 : 1[101d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 05 : 4[901c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 07 : 79[a01d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 03 : 40[101c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 05 : 55[a01d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 05 : 93[901d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 01 : 1[101d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 06 : 4[901c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 05 : 86[a01c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 00 : 81[101d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 03 : 94[a01c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 00 : 3[201d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 00 : 2[201c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 07 : 4[901c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 02 : 1[101d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 01 : 3[201d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 01 : 2[201c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 06 : 93[901d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 06 : 86[a01c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 03 : 1[101d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 04 : 3[201d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 04 : 94[a01c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 02 : 2[201c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 01 : 48[101c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 07 : 93[901d0] -> 94[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 04 : 1[101d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 04 : 40[101c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Channel 05 : 3[201d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 05 : 94[a01c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 03 : 2[201c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 05 : 1[101d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 06 : 94[a01c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 05 : 40[101c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 04 : 2[201c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 01 : 81[101d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 00 : 44[901c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 06 : 1[101d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 00 : 76[901c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 01 : 44[901c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 06 : 40[101c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 05 : 2[201c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 02 : 48[101c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Channel 07 : 94[a01c0] -> 95[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 01 : 76[901c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 02/0 : 16[101c0] -> 20[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 03/0 : 16[101c0] -> 20[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 02 : 44[901c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 00 : 84[901c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 07 : 86[a01c0] -> 87[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 07 : 1[101d0] -> 2[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 07 : 40[101c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 03 : 44[901c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 02 : 76[901c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 00 : 60[901c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 02 : 81[101d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 00 : 43[201d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 00 : 45[901d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 04 : 44[901c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 02 : 7[a01d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 06 : 2[201c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 01 : 84[901c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 03 : 76[901c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 03 : 7[a01d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 01 : 43[201d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 00 : 39[a01d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 01 : 45[901d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 03 : 48[101c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 06 : 7[a01d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 05 : 44[901c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 00 : 78[a01c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 07 : 7[a01d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 01 : 60[901c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 04 : 43[201d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 02 : 45[901d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 04 : 76[901c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 06 : 44[901c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 02 : 84[901c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 07 : 2[201c0] -> 3[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 05 : 43[201d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 03 : 81[101d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 03 : 45[901d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 07 : 44[901c0] -> 45[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 05 : 76[901c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 01 : 39[a01d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 04 : 45[901d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 04 : 48[101c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 00 : 91[201d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 00 : 74[201c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 00 : 68[901c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 05 : 45[901d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 06 : 76[901c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 00 : 75[201d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 01 : 74[201c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 01 : 68[901c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 07 : 76[901c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 06 : 45[901d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 05 : 48[101c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 01 : 75[201d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 02 : 74[201c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 02 : 68[901c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 02 : 60[901c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 04 : 39[a01d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 01 : 78[a01c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 00 : 69[901d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 07 : 45[901d0] -> 46[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 03 : 60[901c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 03 : 68[901c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 04 : 75[201d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 01 : 91[201d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 04 : 60[901c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 00 : 32[101c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 04 : 81[101d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 01 : 69[901d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 03 : 74[201c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 05 : 60[901c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 04 : 68[901c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 02 : 78[a01c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 06 : 60[901c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 01 : 32[101c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 02 : 69[901d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 07 : 60[901c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 05 : 68[901c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 06 : 48[101c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Channel 05 : 75[201d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 03 : 69[901d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 05 : 81[101d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 06 : 68[901c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 02 : 32[101c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 03 : 32[101c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 03 : 84[901c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 00 : 61[901d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 00 : 70[a01c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 01 : 61[901d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 04 : 69[901d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 03 : 78[a01c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 02 : 61[901d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 07 : 68[901c0] -> 69[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 03 : 61[901d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 07 : 48[101c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 01 : 70[a01c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 04 : 91[201d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 04 : 61[901d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 05 : 69[901d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 04 : 84[901c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 04 : 32[101c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 05 : 32[101c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 06 : 32[101c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 06 : 81[101d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 05 : 61[901d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 07 : 32[101c0] -> 33[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 06 : 61[901d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 02 : 70[a01c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 07 : 61[901d0] -> 62[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 04 : 74[201c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 00 : 6[a01c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 00 : 36[901c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 00 : 49[101d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 06 : 69[901d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Channel 05 : 39[a01d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 04 : 78[a01c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 01 : 49[101d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 02 : 49[101d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 03 : 49[101d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 03 : 70[a01c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 04 : 49[101d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 00 : 28[901c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 07 : 81[101d0] -> 82[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 01 : 6[a01c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 05 : 49[101d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 07 : 69[901d0] -> 70[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 00 : 33[101d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 06 : 49[101d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Channel 05 : 91[201d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 05 : 74[201c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 07 : 49[101d0] -> 50[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 00 : 31[a01d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 00 : 12[901c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 05 : 78[a01c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 01 : 33[101d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 00 : 72[101c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 01 : 28[901c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 02 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 01 : 31[a01d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30903 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 02 : 6[a01c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 00/0 : 36[901c0] -> 42[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 02 : 33[101d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30903 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 01/0 : 36[901c0] -> 42[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 01 : 12[901c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 04 : 31[a01d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 06 : 78[a01c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 03 : 33[101d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 02 : 12[901c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 05 : 84[901c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 05 : 31[a01d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 07 : 78[a01c0] -> 79[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 04 : 33[101d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 01 : 36[901c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 03 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 02 : 28[901c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 03 : 12[901c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 06 : 74[201c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 00 : 19[201d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 05 : 33[101d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 06 : 84[901c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 01 : 72[101c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29703 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 00/0 : 60[901c0] -> 66[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-55:29154:29703 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 04 : 12[901c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 01/0 : 60[901c0] -> 66[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 04 : 70[a01c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 02 : 95[a01d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 05 : 12[901c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 06 : 33[101d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 06 : 12[901c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 07 : 84[901c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 01 : 19[201d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 00 : 38[a01c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 05 : 70[a01c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 03 : 28[901c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 06 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 07 : 12[901c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 02 : 36[901c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 07 : 33[101d0] -> 34[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 03 : 6[a01c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 03 : 95[a01d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 00 : 24[101c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 02 : 60[901c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27558 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 06/0 : 64[101c0] -> 57[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26989:27558 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 07/0 : 64[101c0] -> 57[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 04 : 19[201d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 06 : 70[a01c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 00 : 20[901c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 04 : 6[a01c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 07 : 0[101c0] -> 5[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 01 : 38[a01c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 03 : 60[901c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 03 : 36[901c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 00 : 85[901d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 05 : 19[201d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 01 : 20[901c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 02 : 48[101c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 01 : 85[901d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Channel 07 : 70[a01c0] -> 71[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 02/0 : 64[101c0] -> 68[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 07 : 74[201c0] -> 75[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 03/0 : 64[101c0] -> 68[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 02 : 72[101c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 05 : 6[a01c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 02 : 85[901d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 02 : 20[901c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 06 : 60[901c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 06 : 95[a01d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 02 : 38[a01c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 04 : 36[901c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 03 : 85[901d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 04 : 28[901c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 00 : 79[a01d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 02/0 : 88[101c0] -> 92[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 04 : 85[901d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 07 : 60[901c0] -> 65[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 03/0 : 88[101c0] -> 92[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 01 : 24[101c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 03 : 20[901c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 05 : 85[901d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 00 : 13[901d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 00 : 80[101c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 03 : 72[101c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 06 : 85[901d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 05 : 28[901c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Channel 07 : 95[a01d0] -> 90[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 01 : 79[a01d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 03 : 38[a01c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 05 : 36[901c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 07 : 85[901d0] -> 86[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 02 : 24[101c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 02 : 93[901d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 00 : 7[a01d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 00 : 21[901d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 04 : 20[901c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 01 : 80[101c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28693 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 01 : 7[a01d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 00/0 : 84[901c0] -> 90[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-59:27918:28693 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 01/0 : 84[901c0] -> 90[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 03 : 93[901d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 06 : 28[901c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30120 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 00/0 : 54[a01c0] -> 61[901d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-54:29515:30120 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 04 : 79[a01d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 01/0 : 54[a01c0] -> 61[901d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 03 : 48[101c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Connected all rings | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 01 : 13[901d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 06 : 93[901d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 04 : 72[101c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 06 : 36[901c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 04 : 7[a01d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 00 : 8[101c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 04 : 38[a01c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 05 : 7[a01d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 04/0 : 67[201d0] -> 78[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 05/0 : 67[201d0] -> 78[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 03 : 24[101c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 01 : 8[101c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 00 : 27[201d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Channel 07 : 93[901d0] -> 92[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 01 : 21[901d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 05 : 20[901c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 02 : 80[101c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 02 : 8[101c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 00 : 26[201c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 05 : 79[a01d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 07 : 28[901c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 03 : 8[101c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 02 : 13[901d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 03 : 13[901d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 04 : 13[901d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 05 : 13[901d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 06 : 13[901d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 07 : 13[901d0] -> 14[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 04 : 8[101c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 06 : 48[101c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 04 : 24[101c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 01 : 27[201d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 07 : 36[901c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 05 : 8[101c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 05 : 38[a01c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 03 : 80[101c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 06 : 8[101c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 01 : 26[201c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 07 : 8[101c0] -> 9[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 04 : 27[201d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 05 : 24[101c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 07 : 48[101c0] -> 53[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 06 : 38[a01c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 05 : 72[101c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 02 : 26[201c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 06 : 20[901c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 02 : 21[901d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 02 : 5[901d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 02 : 2[201c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 06 : 24[101c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Channel 05 : 27[201d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30121 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 02/0 : 56[101c0] -> 65[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 00 : 22[a01c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30121 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 03/0 : 56[101c0] -> 65[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30188:30760 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 00/0 : 12[901c0] -> 18[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30188:30760 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 01/0 : 12[901c0] -> 18[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 03 : 26[201c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 07 : 20[901c0] -> 21[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 03 : 21[901d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 06 : 72[101c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 03 : 5[901d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 03 : 2[201c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 02 : 62[a01c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 07 : 38[a01c0] -> 39[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 01 : 22[a01c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 03 : 62[a01c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 07 : 24[101c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 04 : 80[101c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 06 : 62[a01c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 06 : 5[901d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 06 : 2[201c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 04 : 21[901d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Channel 07 : 62[a01c0] -> 61[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 04 : 26[201c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 02 : 22[a01c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 05 : 80[101c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Channel 07 : 2[201c0] -> 1[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 07 : 72[101c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 06 : 6[a01c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 00 : 1[101d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 05 : 21[901d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 01 : 1[101d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 06 : 80[101c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 04 : 1[101d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Channel 05 : 1[101d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 05 : 26[201c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 07 : 6[a01c0] -> 7[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 03 : 22[a01c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 06 : 21[901d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 07 : 80[101c0] -> 81[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 07 : 5[901d0] -> 0[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 04 : 22[a01c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 00 : 73[101d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 06 : 26[201c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 07 : 21[901d0] -> 22[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 01 : 73[101d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 00 : 37[901d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 02 : 73[101d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 05 : 22[a01c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 02 : 12[901c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 03 : 73[101d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 04 : 73[101d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 02 : 84[901c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 07 : 26[201c0] -> 27[201d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 05 : 73[101d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 01 : 37[901d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 00 : 25[101d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 06 : 73[101d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 02 : 50[201c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 06 : 22[a01c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 03 : 12[901c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 02 : 71[a01d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 03 : 50[201c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 02 : 37[901d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 06 : 50[201c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 03 : 84[901c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 01 : 25[101d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Channel 07 : 50[201c0] -> 49[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 07 : 73[101d0] -> 74[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 06/0 : 52[901c0] -> 56[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 07/0 : 52[901c0] -> 56[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Channel 07 : 22[a01c0] -> 23[a01d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 03 : 37[901d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30137:30701 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 06/0 : 4[901c0] -> 8[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-45:30137:30701 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 07/0 : 4[901c0] -> 8[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 03 : 71[a01d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-53:26986:27559 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 04/0 : 48[101c0] -> 54[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25776:26344 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26986:27559 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 02 : 25[101d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 04/0 : 24[101c0] -> 30[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 05/0 : 48[101c0] -> 54[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-49:25776:26344 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 05/0 : 24[101c0] -> 30[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 04 : 37[901d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 06 : 12[901c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 05 : 37[901d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 06 : 37[901d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 06 : 71[a01d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 07 : 12[901c0] -> 17[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 03 : 25[101d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 07 : 37[901d0] -> 38[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 06 : 84[901c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Channel 07 : 71[a01d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 04 : 25[101d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 06/0 : 4[901c0] -> 8[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 07/0 : 4[901c0] -> 8[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-45:30138:30698 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 06/0 : 16[101c0] -> 9[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-45:30138:30698 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 07/0 : 16[101c0] -> 9[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 07 : 84[901c0] -> 89[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 05 : 25[101d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 02 : 86[a01c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28621 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 03 : 86[a01c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 02/0 : 88[101c0] -> 92[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 06 : 25[101d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30683 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27920:28621 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 00/0 : 6[a01c0] -> 13[901d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 03/0 : 88[101c0] -> 92[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30082:30683 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 01/0 : 6[a01c0] -> 13[901d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 02 : 14[a01c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26988:27563 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 06/0 : 52[901c0] -> 56[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26988:27563 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 07/0 : 52[901c0] -> 56[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 07 : 25[101d0] -> 26[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 06 : 86[a01c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26346 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 06/0 : 28[901c0] -> 32[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25778:26346 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 07/0 : 28[901c0] -> 32[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 03 : 14[a01c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 04/0 : 43[201d0] -> 66[201c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 05/0 : 43[201d0] -> 66[201c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Channel 07 : 86[a01c0] -> 85[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 06/0 : 69[901d0] -> 80[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 07/0 : 69[901d0] -> 80[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 02 : 72[101c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-58:27724:28285 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 02/0 : 80[101c0] -> 89[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27724:28285 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 03/0 : 80[101c0] -> 89[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 03 : 72[101c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26987:27560 [7] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 04/0 : 60[901c0] -> 55[a01d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-53:26987:27560 [7] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 05/0 : 60[901c0] -> 55[a01d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 06 : 72[101c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 07 : 72[101c0] -> 77[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30902 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 02/0 : 40[101c0] -> 44[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 06 : 14[a01c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30902 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 03/0 : 40[101c0] -> 44[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30086:30684 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 02/0 : 8[101c0] -> 17[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30086:30684 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 03/0 : 8[101c0] -> 17[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Channel 07 : 14[a01c0] -> 13[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 02 : 36[901c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 02 : 24[101c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 00/0 : 60[901c0] -> 66[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 01/0 : 60[901c0] -> 66[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 03 : 36[901c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 03 : 24[101c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 02 : 74[201c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 06 : 36[901c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 06 : 24[101c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 02 : 23[a01d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 07 : 24[101c0] -> 29[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 03 : 74[201c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 02/0 : 40[101c0] -> 44[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 03/0 : 40[101c0] -> 44[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27794:28451 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 06/0 : 88[101c0] -> 81[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27794:28451 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 04/0 : 0[101c0] -> 6[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 05/0 : 0[101c0] -> 6[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 07/0 : 88[101c0] -> 81[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 06 : 74[201c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 03 : 23[a01d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Channel 07 : 74[201c0] -> 73[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 02 : 26[201c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 07 : 36[901c0] -> 41[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 06/0 : 76[901c0] -> 80[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 06 : 23[a01d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 07/0 : 76[901c0] -> 80[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 03 : 26[201c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 06/0 : 45[901d0] -> 68[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 07/0 : 45[901d0] -> 68[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 06 : 26[201c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Channel 07 : 23[a01d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29706 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 02/0 : 64[101c0] -> 68[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29156:29706 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 03/0 : 64[101c0] -> 68[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39398:39970 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 00/0 : 24[101c0] -> 49[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39398:39970 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 01/0 : 24[101c0] -> 49[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27791:28450 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 04/0 : 72[101c0] -> 78[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27791:28450 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 05/0 : 72[101c0] -> 78[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Channel 07 : 26[201c0] -> 25[101d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 04/0 : 48[101c0] -> 54[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 05/0 : 48[101c0] -> 54[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39402:39972 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 02/0 : 28[901c0] -> 53[901d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39402:39972 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 03/0 : 28[901c0] -> 53[901d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 02 : 38[a01c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26341 [7] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 04/0 : 36[901c0] -> 31[a01d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25777:26341 [7] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 05/0 : 36[901c0] -> 31[a01d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 03 : 38[a01c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 06/0 : 28[901c0] -> 32[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 07/0 : 28[901c0] -> 32[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29216:29781 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 02/0 : 32[101c0] -> 41[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 06 : 38[a01c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29781 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 03/0 : 32[101c0] -> 41[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27720:28286 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 00/0 : 78[a01c0] -> 85[901d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-58:27720:28286 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 01/0 : 78[a01c0] -> 85[901d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Channel 07 : 38[a01c0] -> 37[901d0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30697 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 04/0 : 19[201d0] -> 30[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 05/0 : 19[201d0] -> 30[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 04/0 : 0[101c0] -> 6[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30135:30697 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 05/0 : 0[101c0] -> 6[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 00/0 : 84[901c0] -> 90[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 01/0 : 84[901c0] -> 90[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-57:27792:28452 [7] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 04/0 : 84[901c0] -> 79[a01d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27792:28452 [7] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 05/0 : 84[901c0] -> 79[a01d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-45:30136:30699 [7] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 04/0 : 12[901c0] -> 7[a01d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30136:30699 [7] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 05/0 : 12[901c0] -> 7[a01d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-49:25779:26345 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 06/0 : 40[101c0] -> 33[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25779:26345 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 07/0 : 40[101c0] -> 33[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 00/0 : 12[901c0] -> 18[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 01/0 : 12[901c0] -> 18[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-57:27793:28455 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 06/0 : 76[901c0] -> 80[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27793:28455 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 07/0 : 76[901c0] -> 80[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27723:28289 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 02/0 : 76[901c0] -> 88[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27723:28289 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 02/0 : 8[101c0] -> 17[101d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 03/0 : 76[901c0] -> 88[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 03/0 : 8[101c0] -> 17[101d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28172:29021 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 00/0 : 60[901c0] -> 73[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28172:29021 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 01/0 : 60[901c0] -> 73[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 02/0 : 52[901c0] -> 76[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-59:27920:28621 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 03/0 : 52[901c0] -> 76[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 06/0 : 44[901c0] -> 92[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 02/0 : 32[101c0] -> 41[101d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-59:27920:28621 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 07/0 : 44[901c0] -> 92[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 03/0 : 32[101c0] -> 41[101d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 06/0 : 92[901c0] -> 44[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 07/0 : 92[901c0] -> 44[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-44:28069:28963 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 02/0 : 52[901c0] -> 4[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-44:28069:28963 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 03/0 : 52[901c0] -> 4[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 02/0 : 4[901c0] -> 52[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 03/0 : 4[901c0] -> 52[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30190:30756 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 02/0 : 16[101c0] -> 20[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30190:30756 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 03/0 : 16[101c0] -> 20[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 02/0 : 56[101c0] -> 65[101d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 03/0 : 56[101c0] -> 65[101d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 04/0 : 72[101c0] -> 78[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 05/0 : 72[101c0] -> 78[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28176:29020 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 02/0 : 64[101c0] -> 77[901d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28176:29020 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 03/0 : 64[101c0] -> 77[901d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30192:30740 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 00/0 : 12[901c0] -> 25[101d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30192:30740 [1] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 01/0 : 12[901c0] -> 25[101d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30188:30760 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 04/0 : 6[a01c0] -> 18[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30188:30760 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 05/0 : 6[a01c0] -> 18[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 04/0 : 24[101c0] -> 30[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 05/0 : 24[101c0] -> 30[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30196:30734 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 02/0 : 16[101c0] -> 29[901d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30196:30734 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 03/0 : 16[101c0] -> 29[901d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29212:29782 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 00/0 : 30[a01c0] -> 37[901d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29212:29782 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 01/0 : 30[a01c0] -> 37[901d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 02/0 : 41[101d0] -> 32[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 03/0 : 41[101d0] -> 32[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 06/0 : 21[901d0] -> 32[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 00/0 : 60[901c0] -> 73[101d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 01/0 : 60[901c0] -> 73[101d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 02/0 : 17[101d0] -> 8[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 03/0 : 17[101d0] -> 8[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 07/0 : 21[901d0] -> 32[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29215:29783 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 02/0 : 28[901c0] -> 40[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-50:29215:29783 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 03/0 : 28[901c0] -> 40[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30344:30902 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 06/0 : 20[901c0] -> 44[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30344:30902 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 07/0 : 20[901c0] -> 44[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29154:29703 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 04/0 : 54[a01c0] -> 66[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-55:29154:29703 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 05/0 : 54[a01c0] -> 66[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 00/0 : 36[901c0] -> 42[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 01/0 : 36[901c0] -> 42[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 02/0 : 65[101d0] -> 56[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 03/0 : 65[101d0] -> 56[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 00/0 : 54[a01c0] -> 61[901d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 01/0 : 54[a01c0] -> 61[901d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 00/0 : 30[a01c0] -> 37[901d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 01/0 : 30[a01c0] -> 37[901d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 02/0 : 64[101c0] -> 77[901d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 03/0 : 64[101c0] -> 77[901d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 00/0 : 48[101c0] -> 72[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 01/0 : 48[101c0] -> 72[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-55:29156:29706 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 06/0 : 56[101c0] -> 68[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29156:29706 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 07/0 : 56[101c0] -> 68[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 00/0 : 6[a01c0] -> 13[901d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-44:28065:28962 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 00/0 : 48[101c0] -> 0[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-44:28065:28962 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 01/0 : 48[101c0] -> 0[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 48[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 48[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 01/0 : 6[a01c0] -> 13[901d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 02/0 : 28[901c0] -> 40[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 03/0 : 28[901c0] -> 40[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-59:27918:28693 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 04/0 : 42[201c0] -> 90[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-59:27918:28693 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 05/0 : 42[201c0] -> 90[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 04/0 : 90[201c0] -> 42[201c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 05/0 : 90[201c0] -> 42[201c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27719:28283 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 00/0 : 72[101c0] -> 84[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-58:27719:28283 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 01/0 : 72[101c0] -> 84[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 00/0 : 12[901c0] -> 25[101d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 01/0 : 12[901c0] -> 25[101d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 06/0 : 8[101c0] -> 20[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 07/0 : 8[101c0] -> 20[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 00/0 : 72[101c0] -> 84[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 02/0 : 76[901c0] -> 88[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 01/0 : 72[101c0] -> 84[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 03/0 : 76[901c0] -> 88[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 02/0 : 16[101c0] -> 29[901d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 03/0 : 16[101c0] -> 29[901d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-49:25778:26346 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 06/0 : 21[901d0] -> 32[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25778:26346 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 07/0 : 21[901d0] -> 32[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 02/0 : 80[101c0] -> 89[101d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 00/0 : 78[a01c0] -> 85[901d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 03/0 : 80[101c0] -> 89[101d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 01/0 : 78[a01c0] -> 85[901d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30190:30756 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 00/0 : 24[101c0] -> 36[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 01/0 : 24[101c0] -> 36[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 06/0 : 8[101c0] -> 20[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30190:30756 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 07/0 : 8[101c0] -> 20[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 06/0 : 56[101c0] -> 68[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 07/0 : 56[101c0] -> 68[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-54:29514:30119 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 00/0 : 73[101d0] -> 60[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-54:29514:30119 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 01/0 : 73[101d0] -> 60[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 02/0 : 89[101d0] -> 80[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 03/0 : 89[101d0] -> 80[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 00/0 : 73[101d0] -> 60[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 01/0 : 73[101d0] -> 60[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 00/0 : 37[901d0] -> 30[a01c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 01/0 : 37[901d0] -> 30[a01c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29211:29780 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 00/0 : 24[101c0] -> 36[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29211:29780 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 01/0 : 24[101c0] -> 36[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-49:25776:26344 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 04/0 : 19[201d0] -> 30[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25776:26344 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 05/0 : 19[201d0] -> 30[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 04/0 : 54[a01c0] -> 66[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 05/0 : 54[a01c0] -> 66[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-51:30342:30903 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 04/0 : 18[201c0] -> 42[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30342:30903 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 05/0 : 18[201c0] -> 42[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 02/0 : 77[901d0] -> 64[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 03/0 : 77[901d0] -> 64[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29518:30122 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 02/0 : 77[901d0] -> 64[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-54:29518:30122 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 03/0 : 77[901d0] -> 64[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 00/0 : 61[901d0] -> 54[a01c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 01/0 : 61[901d0] -> 54[a01c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 00 : 73[101d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 01 : 73[101d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 04 : 73[101d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Channel 05 : 73[101d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 00/0 : 13[901d0] -> 6[a01c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 01/0 : 13[901d0] -> 6[a01c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 04/0 : 6[a01c0] -> 18[201c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 05/0 : 6[a01c0] -> 18[201c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 02/0 : 28[901c0] -> 53[901d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 03/0 : 28[901c0] -> 53[901d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 02 : 77[901d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 03 : 77[901d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 06 : 77[901d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 07 : 77[901d0] -> 72[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 02/0 : 40[101c0] -> 28[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 03/0 : 40[101c0] -> 28[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30081:30686 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 00/0 : 25[101d0] -> 12[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-46:30081:30686 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 00/0 : 25[101d0] -> 12[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 01/0 : 25[101d0] -> 12[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 01/0 : 25[101d0] -> 12[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28175:29019 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 02/0 : 52[901c0] -> 76[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28175:29019 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 03/0 : 52[901c0] -> 76[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30191:30759 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 06/0 : 32[101c0] -> 21[901d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30191:30759 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 07/0 : 32[101c0] -> 21[901d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-57:27791:28450 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28455 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 04/0 : 67[201d0] -> 78[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 06/0 : 69[901d0] -> 80[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27791:28450 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 05/0 : 67[201d0] -> 78[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-57:27793:28455 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 07/0 : 69[901d0] -> 80[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28171:29023 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 00/0 : 48[101c0] -> 72[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28171:29023 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 01/0 : 48[101c0] -> 72[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 00/0 : 84[901c0] -> 72[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 00/0 : 36[901c0] -> 24[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 02/0 : 29[901d0] -> 16[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30189:30757 [3] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29155:29701 [3] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30085:30688 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 01/0 : 84[901c0] -> 72[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 01/0 : 36[901c0] -> 24[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 03/0 : 29[901d0] -> 16[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 00 : 25[101d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 04/0 : 78[a01c0] -> 67[201d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 02/0 : 88[101c0] -> 76[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 00/0 : 85[901d0] -> 78[a01c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 01 : 25[101d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29701 [3] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 03/0 : 88[101c0] -> 76[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 04 : 25[101d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 05/0 : 78[a01c0] -> 67[201d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 01/0 : 85[901d0] -> 78[a01c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Channel 05 : 25[101d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29702 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 06/0 : 80[101c0] -> 69[901d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-55:29157:29702 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 07/0 : 80[101c0] -> 69[901d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 02 : 29[901d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 03 : 29[901d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 06 : 29[901d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 07 : 29[901d0] -> 24[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 00/0 : 24[101c0] -> 49[101d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 01/0 : 24[101c0] -> 49[101d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 04/0 : 30[a01c0] -> 19[201d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30189:30757 [3] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 05/0 : 30[a01c0] -> 19[201d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 02/0 : 29[901d0] -> 16[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30085:30688 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 03/0 : 29[901d0] -> 16[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30137:30701 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 06/0 : 20[901c0] -> 8[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-45:30137:30701 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 07/0 : 20[901c0] -> 8[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 06/0 : 32[101c0] -> 21[901d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26988:27563 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 06/0 : 20[901c0] -> 44[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 07/0 : 32[101c0] -> 21[901d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 07/0 : 20[901c0] -> 44[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 06/0 : 68[901c0] -> 56[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26988:27563 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 07/0 : 68[901c0] -> 56[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-53:26986:27559 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 04/0 : 66[201c0] -> 54[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-53:26986:27559 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 05/0 : 66[201c0] -> 54[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29156:29706 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 06/0 : 45[901d0] -> 68[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29156:29706 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 07/0 : 45[901d0] -> 68[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39401:39973 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39397:39974 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 02/0 : 4[901c0] -> 52[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 00/0 : 0[101c0] -> 48[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39397:39974 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39401:39973 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 01/0 : 0[101c0] -> 48[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 03/0 : 4[901c0] -> 52[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 00/0 : 48[101c0] -> 0[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 02/0 : 52[901c0] -> 4[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 01/0 : 48[101c0] -> 0[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 03/0 : 52[901c0] -> 4[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29154:29703 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 04/0 : 43[201d0] -> 66[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-55:29154:29703 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 05/0 : 43[201d0] -> 66[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 02 : 21[901d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 04/0 : 30[a01c0] -> 19[201d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 05/0 : 30[a01c0] -> 19[201d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 03 : 21[901d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 06 : 21[901d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Channel 07 : 21[901d0] -> 20[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 04/0 : 18[201c0] -> 42[201c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 05/0 : 18[201c0] -> 42[201c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30135:30697 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 04/0 : 18[201c0] -> 6[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30135:30697 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 05/0 : 18[201c0] -> 6[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29514:30119 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 00/0 : 66[201c0] -> 60[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-54:29514:30119 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 01/0 : 66[201c0] -> 60[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-46:30081:30686 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 00/0 : 18[201c0] -> 12[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 06/0 : 64[101c0] -> 57[101d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30081:30686 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 01/0 : 18[201c0] -> 12[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 07/0 : 64[101c0] -> 57[101d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 02/0 : 53[901d0] -> 28[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 03/0 : 53[901d0] -> 28[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30195:30738 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 02/0 : 53[901d0] -> 28[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30195:30738 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 03/0 : 53[901d0] -> 28[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 00 : 19[201d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 01 : 19[201d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 04 : 19[201d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Channel 05 : 19[201d0] -> 18[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 06/0 : 16[101c0] -> 9[101d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 07/0 : 16[101c0] -> 9[101d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 02/0 : 76[901c0] -> 52[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 03/0 : 76[901c0] -> 52[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 02 : 53[901d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 03 : 53[901d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 06 : 53[901d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 07 : 53[901d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 02 : 9[101d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 03 : 9[101d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 06 : 9[101d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Channel 07 : 9[101d0] -> 8[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 02 : 57[101d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 03 : 57[101d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 06 : 57[101d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25778:26346 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 02/0 : 41[101d0] -> 32[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25778:26346 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 03/0 : 41[101d0] -> 32[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Channel 07 : 57[101d0] -> 56[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 06/0 : 80[101c0] -> 69[901d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 07/0 : 80[101c0] -> 69[901d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 00/0 : 72[101c0] -> 48[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 01/0 : 72[101c0] -> 48[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 04/0 : 78[a01c0] -> 67[201d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 05/0 : 78[a01c0] -> 67[201d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30191:30741 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 00/0 : 49[101d0] -> 24[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30191:30741 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 01/0 : 49[101d0] -> 24[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30345:30900 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 06/0 : 68[901c0] -> 45[901d0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30345:30900 [5] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 07/0 : 68[901c0] -> 45[901d0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 00/0 : 49[101d0] -> 24[101c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 01/0 : 49[101d0] -> 24[101c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 02 : 41[101d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 03 : 41[101d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 06 : 41[101d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 07 : 41[101d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26344 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30343:30905 [3] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 04/0 : 66[201c0] -> 43[201d0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30343:30905 [3] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 05/0 : 66[201c0] -> 43[201d0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 00/0 : 37[901d0] -> 30[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25776:26344 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 01/0 : 37[901d0] -> 30[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 02 : 69[901d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 03 : 69[901d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 06 : 69[901d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 00 : 67[201d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Channel 07 : 69[901d0] -> 68[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 01 : 67[201d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 04 : 67[201d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Channel 05 : 67[201d0] -> 66[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 00 : 49[101d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 01 : 49[101d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 04 : 49[101d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Channel 05 : 49[101d0] -> 48[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29518:30122 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 02/0 : 68[901c0] -> 64[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-54:29518:30122 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Channel 03/0 : 68[901c0] -> 64[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-46:30085:30688 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 02/0 : 20[901c0] -> 16[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-46:30085:30688 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Channel 03/0 : 20[901c0] -> 16[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30195:30738 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 02/0 : 40[101c0] -> 28[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30195:30738 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 03/0 : 40[101c0] -> 28[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 00 : 37[901d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 01 : 37[901d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 04 : 37[901d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Channel 05 : 37[901d0] -> 36[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30342:30903 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30344:30902 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 04/0 : 90[201c0] -> 42[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 06/0 : 92[901c0] -> 44[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30342:30903 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 05/0 : 90[201c0] -> 42[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30344:30902 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 07/0 : 92[901c0] -> 44[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 04/0 : 42[201c0] -> 90[201c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 06/0 : 44[901c0] -> 92[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 05/0 : 42[201c0] -> 90[201c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 07/0 : 44[901c0] -> 92[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-47:30190:30756 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 06/0 : 44[901c0] -> 20[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30190:30756 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 07/0 : 44[901c0] -> 20[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-47:30188:30760 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 04/0 : 42[201c0] -> 18[201c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30188:30760 [2] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 05/0 : 42[201c0] -> 18[201c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39401:39973 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 02/0 : 76[901c0] -> 52[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39401:39973 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 03/0 : 76[901c0] -> 52[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 06/0 : 68[901c0] -> 45[901d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 07/0 : 68[901c0] -> 45[901d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 06/0 : 32[101c0] -> 28[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Channel 07/0 : 32[101c0] -> 28[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-44:28069:28963 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 06/0 : 8[101c0] -> 4[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-44:28069:28963 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Channel 07/0 : 8[101c0] -> 4[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 04/0 : 66[201c0] -> 43[201d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 05/0 : 66[201c0] -> 43[201d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30191:30741 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 00/0 : 36[901c0] -> 24[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30191:30741 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 01/0 : 36[901c0] -> 24[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27791:28450 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28455 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 00/0 : 85[901d0] -> 78[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 02/0 : 89[101d0] -> 80[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27791:28450 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27793:28455 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 01/0 : 85[901d0] -> 78[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 03/0 : 89[101d0] -> 80[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 02 : 45[901d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 03 : 45[901d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 06 : 45[901d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Channel 07 : 45[901d0] -> 44[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28962 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 04/0 : 6[a01c0] -> 0[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-44:28065:28962 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Channel 05/0 : 6[a01c0] -> 0[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 04/0 : 30[a01c0] -> 24[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Channel 05/0 : 30[a01c0] -> 24[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 00 : 43[201d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 01 : 43[201d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 04 : 43[201d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Channel 05 : 43[201d0] -> 42[201c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 02 : 89[101d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 03 : 89[101d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 06 : 89[101d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39397:39974 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 00/0 : 72[101c0] -> 48[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39397:39974 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 01/0 : 72[101c0] -> 48[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 07 : 89[101d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 00 : 85[901d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 01 : 85[901d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 04 : 85[901d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Channel 05 : 85[901d0] -> 84[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-56:28175:29019 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 02/0 : 88[101c0] -> 76[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28175:29019 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 03/0 : 88[101c0] -> 76[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-52:39401:39973 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 06/0 : 56[101c0] -> 52[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-52:39401:39973 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Channel 07/0 : 56[101c0] -> 52[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 06/0 : 68[901c0] -> 56[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 07/0 : 68[901c0] -> 56[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 04/0 : 66[201c0] -> 54[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 05/0 : 66[201c0] -> 54[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 04/0 : 78[a01c0] -> 72[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 06/0 : 80[101c0] -> 76[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Channel 05/0 : 78[a01c0] -> 72[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Channel 07/0 : 80[101c0] -> 76[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 06/0 : 40[101c0] -> 33[101d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 07/0 : 40[101c0] -> 33[101d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-52:39397:39974 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 04/0 : 54[a01c0] -> 48[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-52:39397:39974 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Channel 05/0 : 54[a01c0] -> 48[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28171:29023 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 00/0 : 84[901c0] -> 72[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28171:29023 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 01/0 : 84[901c0] -> 72[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-48:30195:30738 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 06/0 : 32[101c0] -> 28[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-48:30195:30738 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Channel 07/0 : 32[101c0] -> 28[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 04/0 : 42[201c0] -> 18[201c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 06/0 : 44[901c0] -> 20[901c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 05/0 : 42[201c0] -> 18[201c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 07/0 : 44[901c0] -> 20[901c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 00/0 : 90[201c0] -> 84[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 02/0 : 92[901c0] -> 88[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Channel 03/0 : 92[901c0] -> 88[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Channel 01/0 : 90[201c0] -> 84[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 02 : 33[101d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 03 : 33[101d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 06 : 33[101d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Channel 07 : 33[101d0] -> 32[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 02/0 : 68[901c0] -> 64[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Channel 03/0 : 68[901c0] -> 64[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26988:27563 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 02/0 : 65[101d0] -> 56[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26988:27563 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 03/0 : 65[101d0] -> 56[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-53:26986:27559 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 00/0 : 61[901d0] -> 54[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-53:26986:27559 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 01/0 : 61[901d0] -> 54[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-50:29211:29780 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 00/0 : 42[201c0] -> 36[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29211:29780 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 01/0 : 42[201c0] -> 36[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30191:30741 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 04/0 : 30[a01c0] -> 24[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-48:30191:30741 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Channel 05/0 : 30[a01c0] -> 24[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 00/0 : 66[201c0] -> 60[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Channel 01/0 : 66[201c0] -> 60[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 02 : 65[101d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 03 : 65[101d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 06 : 65[101d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 07 : 65[101d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 00 : 61[901d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 01 : 61[901d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29215:29783 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 02/0 : 44[901c0] -> 40[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-50:29215:29783 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Channel 03/0 : 44[901c0] -> 40[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 04 : 61[901d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Channel 05 : 61[901d0] -> 60[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 04/0 : 18[201c0] -> 6[a01c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 06/0 : 20[901c0] -> 8[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 05/0 : 18[201c0] -> 6[a01c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 07/0 : 20[901c0] -> 8[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 02 : 29[901d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 02/0 : 44[901c0] -> 40[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 00/0 : 42[201c0] -> 36[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Channel 01/0 : 42[201c0] -> 36[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Channel 03/0 : 44[901c0] -> 40[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 03 : 29[901d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 06 : 29[901d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Channel 07 : 29[901d0] -> 28[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 06/0 : 88[101c0] -> 81[101d0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 07/0 : 88[101c0] -> 81[101d0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-56:28175:29019 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 06/0 : 80[101c0] -> 76[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-56:28175:29019 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Channel 07/0 : 80[101c0] -> 76[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 02 : 81[101d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 03 : 81[101d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 06 : 81[101d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 04/0 : 54[a01c0] -> 48[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 06/0 : 56[101c0] -> 52[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Channel 05/0 : 54[a01c0] -> 48[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Channel 07/0 : 56[101c0] -> 52[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Channel 07 : 81[101d0] -> 80[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28171:29023 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 04/0 : 78[a01c0] -> 72[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-56:28171:29023 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Channel 05/0 : 78[a01c0] -> 72[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-58:27719:28283 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 00/0 : 90[201c0] -> 84[901c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-58:27719:28283 [4] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 01/0 : 90[201c0] -> 84[901c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 04/0 : 60[901c0] -> 55[a01d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Channel 05/0 : 60[901c0] -> 55[a01d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 02 : 65[101d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 03 : 65[101d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 06 : 65[101d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Channel 07 : 65[101d0] -> 64[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-45:30137:30701 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 02/0 : 17[101d0] -> 8[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-45:30137:30701 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 03/0 : 17[101d0] -> 8[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-55:29156:29664 [4] NCCL INFO comm 0x7f1d98009010 rank 68 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-55:29154:29667 [2] NCCL INFO comm 0x7fc438009010 rank 66 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 02 : 77[901d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 03 : 77[901d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 00/0 : 18[201c0] -> 12[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 02/0 : 20[901c0] -> 16[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Channel 01/0 : 18[201c0] -> 12[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Channel 03/0 : 20[901c0] -> 16[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 06 : 77[901d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-48:30196:30702 [5] NCCL INFO comm 0x7f1d10009010 rank 29 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-48:30195:30703 [4] NCCL INFO comm 0x7f185c009010 rank 28 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-48:30192:30700 [1] NCCL INFO comm 0x7fb6ec009010 rank 25 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-48:30193:30701 [2] NCCL INFO comm 0x7fce2c009010 rank 26 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-48:30194:30699 [3] NCCL INFO comm 0x7f08f0009010 rank 27 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Channel 07 : 77[901d0] -> 76[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-48:30191:30698 [0] NCCL INFO comm 0x7f5cc0009010 rank 24 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 00 : 55[a01d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-55:29159:29666 [7] NCCL INFO comm 0x7f6c3c009010 rank 71 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 01 : 55[a01d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30135:30697 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 00/0 : 13[901d0] -> 6[a01c0] [receive] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-55:29158:29665 [6] NCCL INFO comm 0x7fca4c009010 rank 70 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-45:30135:30697 [6] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 01/0 : 13[901d0] -> 6[a01c0] [receive] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-55:29157:29663 [5] NCCL INFO comm 0x7fcb24009010 rank 69 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 04 : 55[a01d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-55:29155:29662 [3] NCCL INFO comm 0x7fedc8009010 rank 67 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 04/0 : 36[901c0] -> 31[a01d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Channel 05/0 : 36[901c0] -> 31[a01d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Channel 05 : 55[a01d0] -> 54[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 02 : 53[901d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 02 : 41[101d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 03 : 41[101d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 03 : 53[901d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 06 : 41[101d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 06 : 53[901d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Channel 07 : 41[101d0] -> 40[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Channel 07 : 53[901d0] -> 52[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-58:27723:28289 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 02/0 : 92[901c0] -> 88[101c0] [receive] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-58:27723:28289 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Channel 03/0 : 92[901c0] -> 88[101c0] [receive] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 02 : 17[101d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 03 : 17[101d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 06 : 17[101d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 07 : 17[101d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-51:30344:30851 [4] NCCL INFO comm 0x7f2ac4009010 rank 44 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-51:30346:30852 [6] NCCL INFO comm 0x7f71d4009010 rank 46 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-51:30342:30850 [2] NCCL INFO comm 0x7f8b0c009010 rank 42 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 00 : 13[901d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 01 : 13[901d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 04 : 13[901d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Channel 05 : 13[901d0] -> 12[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-51:30347:30853 [7] NCCL INFO comm 0x7f59c8009010 rank 47 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-51:30343:30849 [3] NCCL INFO comm 0x7fc078009010 rank 43 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-51:30345:30854 [5] NCCL INFO comm 0x7fc5dc009010 rank 45 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 00 : 31[a01d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 01 : 31[a01d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 04 : 31[a01d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Channel 05 : 31[a01d0] -> 30[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 04/0 : 84[901c0] -> 79[a01d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Channel 05/0 : 84[901c0] -> 79[a01d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 02 : 89[101d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 03 : 89[101d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 06 : 89[101d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Channel 07 : 89[101d0] -> 88[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-54:29519:30067 [1] NCCL INFO comm 0x7f3130009010 rank 65 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-54:29516:30071 [6] NCCL INFO comm 0x7fe490009010 rank 62 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-54:29514:30069 [4] NCCL INFO comm 0x7f1024009010 rank 60 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-54:29515:30066 [5] NCCL INFO comm 0x7f939c009010 rank 61 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-54:29517:30068 [7] NCCL INFO comm 0x7f25c8009010 rank 63 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-54:29518:30070 [0] NCCL INFO comm 0x7fa29c009010 rank 64 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-56:28171:28973 [0] NCCL INFO comm 0x7f21bc009010 rank 72 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-56:28172:28968 [1] NCCL INFO comm 0x7f2e38009010 rank 73 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-56:28176:28971 [5] NCCL INFO comm 0x7f41ac009010 rank 77 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-56:28173:28967 [2] NCCL INFO comm 0x7f2910009010 rank 74 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-56:28174:28970 [3] NCCL INFO comm 0x7fa2fc009010 rank 75 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-56:28175:28972 [4] NCCL INFO comm 0x7f2810009010 rank 76 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 06/0 : 8[101c0] -> 4[901c0] [send] via NET/Socket/0 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 04/0 : 6[a01c0] -> 0[101c0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Channel 07/0 : 8[101c0] -> 4[901c0] [send] via NET/Socket/1 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Channel 05/0 : 6[a01c0] -> 0[101c0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-53:26991:27496 [3] NCCL INFO comm 0x7f1c08009010 rank 59 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-53:26988:27497 [0] NCCL INFO comm 0x7fbf94009010 rank 56 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-53:26989:27495 [1] NCCL INFO comm 0x7f6d40009010 rank 57 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-53:26990:27493 [2] NCCL INFO comm 0x7fb30c009010 rank 58 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-53:26986:27498 [6] NCCL INFO comm 0x7f47cc009010 rank 54 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-53:26987:27494 [7] NCCL INFO comm 0x7fa94c009010 rank 55 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 00 : 79[a01d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 01 : 79[a01d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 04 : 79[a01d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Channel 05 : 79[a01d0] -> 78[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-52:39401:39921 [4] NCCL INFO comm 0x7f9220009010 rank 52 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-52:39402:39919 [5] NCCL INFO comm 0x7f9d04009010 rank 53 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-52:39398:39917 [1] NCCL INFO comm 0x7ff498009010 rank 49 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-52:39400:39922 [3] NCCL INFO comm 0x7f1134009010 rank 51 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-52:39397:39920 [0] NCCL INFO comm 0x7f7e5c009010 rank 48 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-52:39399:39918 [2] NCCL INFO comm 0x7f254c009010 rank 50 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 04/0 : 12[901c0] -> 7[a01d0] [send] via NET/Socket/2 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Channel 05/0 : 12[901c0] -> 7[a01d0] [send] via NET/Socket/3 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-50:29213:29720 [6] NCCL INFO comm 0x7f2088009010 rank 38 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-50:29211:29719 [4] NCCL INFO comm 0x7f7a40009010 rank 36 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-50:29212:29721 [5] NCCL INFO comm 0x7fd770009010 rank 37 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-50:29216:29724 [1] NCCL INFO comm 0x7f3b88009010 rank 41 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-50:29214:29723 [7] NCCL INFO comm 0x7fb134009010 rank 39 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-50:29215:29722 [0] NCCL INFO comm 0x7fe408009010 rank 40 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 02 : 17[101d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 03 : 17[101d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 06 : 17[101d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Channel 07 : 17[101d0] -> 16[101c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-49:25776:26284 [6] NCCL INFO comm 0x7fd568009010 rank 30 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-49:25779:26285 [1] NCCL INFO comm 0x7f080c009010 rank 33 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-49:25781:26281 [3] NCCL INFO comm 0x7f0154009010 rank 35 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-49:25778:26280 [0] NCCL INFO comm 0x7f1324009010 rank 32 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-49:25780:26286 [2] NCCL INFO comm 0x7fb678009010 rank 34 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-49:25777:26283 [7] NCCL INFO comm 0x7f239c009010 rank 31 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 02 : 5[901d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30193:30702 [7] NCCL INFO comm 0x7f2dd8009010 rank 23 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-47:30190:30703 [4] NCCL INFO comm 0x7f7bb4009010 rank 20 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-47:30192:30698 [6] NCCL INFO comm 0x7ff590009010 rank 22 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-47:30188:30701 [2] NCCL INFO comm 0x7f07b8009010 rank 18 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 03 : 5[901d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-47:30189:30700 [3] NCCL INFO comm 0x7faaec009010 rank 19 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-47:30191:30699 [5] NCCL INFO comm 0x7f19a0009010 rank 21 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 06 : 5[901d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-59:27919:28520 [3] NCCL INFO comm 0x7efcfc009010 rank 91 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-59:27918:28517 [2] NCCL INFO comm 0x7f3934009010 rank 90 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-59:27921:28521 [5] NCCL INFO comm 0x7fdd68009010 rank 93 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-59:27923:28518 [7] NCCL INFO comm 0x7f66d4009010 rank 95 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-59:27922:28516 [6] NCCL INFO comm 0x7ff598009010 rank 94 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-59:27920:28519 [4] NCCL INFO comm 0x7f6e6c009010 rank 92 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Channel 07 : 5[901d0] -> 4[901c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 00 : 7[a01d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 01 : 7[a01d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 04 : 7[a01d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Channel 05 : 7[a01d0] -> 6[a01c0] via P2P/IPC/read | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO Connected all trees | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO threadThresholds 8/8/64 | 768/8/64 | 8/8/512 | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer | |
gpu-st-p4d-24xlarge-57:27791:28308 [6] NCCL INFO comm 0x7f6550009010 rank 78 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-57:27795:28304 [2] NCCL INFO comm 0x7f71bc009010 rank 82 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-57:27793:28309 [0] NCCL INFO comm 0x7f3cc0009010 rank 80 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-57:27794:28306 [1] NCCL INFO comm 0x7f8a80009010 rank 81 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-57:27796:28307 [3] NCCL INFO comm 0x7f729c009010 rank 83 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-57:27792:28305 [7] NCCL INFO comm 0x7fdb24009010 rank 79 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-58:27723:28236 [0] NCCL INFO comm 0x7f866c009010 rank 88 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-58:27721:28234 [6] NCCL INFO comm 0x7f41b0009010 rank 86 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-58:27719:28237 [4] NCCL INFO comm 0x7f7a3c009010 rank 84 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-58:27720:28232 [5] NCCL INFO comm 0x7ff810009010 rank 85 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-58:27722:28233 [7] NCCL INFO comm 0x7fc6d8009010 rank 87 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-58:27724:28235 [1] NCCL INFO comm 0x7f0b24009010 rank 89 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-44:28069:28907 [4] NCCL INFO comm 0x7f7aac009010 rank 4 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-46:30083:30635 [6] NCCL INFO comm 0x7fd7d8009010 rank 14 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-46:30081:30633 [4] NCCL INFO comm 0x7fae6c009010 rank 12 nranks 96 cudaDev 4 busId 901c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-46:30085:30637 [0] NCCL INFO comm 0x7f674c009010 rank 16 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-45:30135:30642 [6] NCCL INFO comm 0x7f1668009010 rank 6 nranks 96 cudaDev 6 busId a01c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-46:30082:30636 [5] NCCL INFO comm 0x7f8718009010 rank 13 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-46:30086:30632 [1] NCCL INFO comm 0x7f2de8009010 rank 17 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-46:30084:30634 [7] NCCL INFO comm 0x7f79f8009010 rank 15 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-45:30136:30644 [7] NCCL INFO comm 0x7f9840009010 rank 7 nranks 96 cudaDev 7 busId a01d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-45:30138:30645 [1] NCCL INFO comm 0x7fb2c4009010 rank 9 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-44:28065:28903 [0] NCCL INFO comm 0x7f483c009010 rank 0 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-44:28067:28911 [2] NCCL INFO comm 0x7fc4e4009010 rank 2 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-44:28065:28065 [0] NCCL INFO Launch mode Parallel | |
gpu-st-p4d-24xlarge-45:30145:30643 [3] NCCL INFO comm 0x7f3fd4009010 rank 11 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-45:30137:30647 [0] NCCL INFO comm 0x7f4de8009010 rank 8 nranks 96 cudaDev 0 busId 101c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-45:30139:30646 [2] NCCL INFO comm 0x7f22b4009010 rank 10 nranks 96 cudaDev 2 busId 201c0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-44:28068:28905 [3] NCCL INFO comm 0x7fb374009010 rank 3 nranks 96 cudaDev 3 busId 201d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-44:28070:28908 [5] NCCL INFO comm 0x7f4824009010 rank 5 nranks 96 cudaDev 5 busId 901d0 - Init COMPLETE | |
gpu-st-p4d-24xlarge-44:28066:28906 [1] NCCL INFO comm 0x7f2d54009010 rank 1 nranks 96 cudaDev 1 busId 101d0 - Init COMPLETE | |
=========================================================================== | |
Layer (type:depth-idx) Param # | |
=========================================================================== | |
DistributedDataParallel -- | |
├─ConvNeXt: 1-1 -- | |
│ └─Sequential: 2-1 -- | |
│ │ └─Conv2d: 3-1 24,832 | |
│ │ └─LayerNorm2d: 3-2 512 | |
│ └─Sequential: 2-2 -- | |
│ │ └─ConvNeXtStage: 3-3 1,617,408 | |
│ │ └─ConvNeXtStage: 3-4 6,905,856 | |
│ │ └─ConvNeXtStage: 3-5 230,195,200 | |
│ │ └─ConvNeXtStage: 3-6 109,412,352 | |
│ └─Identity: 2-3 -- | |
│ └─Sequential: 2-4 -- | |
│ │ └─SelectAdaptivePool2d: 3-7 -- | |
│ │ └─LayerNorm2d: 3-8 4,096 | |
│ │ └─Flatten: 3-9 -- | |
│ │ └─Dropout: 3-10 -- | |
│ │ └─Linear: 3-11 104,499 | |
=========================================================================== | |
Total params: 348,264,755 | |
Trainable params: 348,264,755 | |
Non-trainable params: 0 | |
=========================================================================== | |
2022-08-26 22:21:04.354284: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.372947: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.377720: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.379205: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.382281: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.383752: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.386802: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.387481: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.388446: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.389611: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.389822: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.389868: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.392839: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.396323: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.397133: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.397629: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.398357: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.399119: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.400092: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.400348: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.401414: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.402340: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.403810: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.406097: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.406138: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.408209: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.408203: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.408708: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.408746: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.409046: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.409935: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.410291: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.410099: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.410347: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.411249: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.411253: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.411291: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.413350: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.413463: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.413488: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.415528: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.416317: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.416868: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.416868: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.416875: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.416899: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.417148: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.418477: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.418731: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.419576: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.419609: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.419827: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.420470: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.420980: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.421020: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.422087: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.425960: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.428004: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.427978: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.428153: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.428316: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.429122: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.429246: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.433292: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.433392: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.436057: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.436868: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.437078: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.438253: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.440693: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.440718: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.441147: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.441777: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.442713: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.443545: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.443830: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.444067: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.444296: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.444482: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.445428: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.447355: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.449262: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.455116: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.455230: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.457389: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.457427: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.458257: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.458673: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.459110: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.459695: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.462970: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.465576: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.466839: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.475390: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.501794: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-08-26 22:21:04.587484: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Starting loop | |
Epoch: 0, Step: 0, Loss: 19.943899154663086 , Acc: 0.0 | Time taken: 202.68939018249512 | |
Epoch: 0, Step: 0, Loss: 19.987777709960938 , Acc: 0.0 | Time taken: 205.91121792793274 | |
Epoch: 0, Step: 0, Loss: 19.817249298095703 , Acc: 0.0 | Time taken: 208.08953380584717 | |
Epoch: 0, Step: 0, Loss: 19.493167877197266 , Acc: 0.0 | Time taken: 198.46081614494324 | |
Epoch: 0, Step: 0, Loss: 18.375741958618164 , Acc: 0.0 | Time taken: 139.7204189300537 | |
Epoch: 0, Step: 0, Loss: 17.676410675048828 , Acc: 0.0 | Time taken: 118.25044631958008 | |
Epoch: 0, Step: 0, Loss: 11.584511756896973 , Acc: 0.0 | Time taken: 201.53303003311157Epoch: 0, Step: 0, Loss: 11.903446197509766 , Acc: 0.0 | Time taken: 200.31899857521057 | |
Epoch: 0, Step: 0, Loss: 18.943023681640625 , Acc: 0.0 | Time taken: 195.6868269443512 | |
Epoch: 0, Step: 0, Loss: 14.19175910949707 , Acc: 0.0 | Time taken: 199.0504264831543 | |
Epoch: 0, Step: 0, Loss: 11.817590713500977 , Acc: 0.0 | Time taken: 107.53136038780212 | |
Epoch: 0, Step: 0, Loss: 19.778484344482422 , Acc: 0.0 | Time taken: 193.1314821243286 | |
Epoch: 0, Step: 0, Loss: 19.515384674072266 , Acc: 0.0 | Time taken: 214.25619220733643 | |
Epoch: 0, Step: 0, Loss: 19.515384674072266 , Acc: 0.0 | Time taken: 199.47413682937622 | |
Epoch: 0, Step: 0, Loss: 19.73740577697754 , Acc: 0.0 | Time taken: 198.49446988105774 | |
Epoch: 0, Step: 0, Loss: 19.836467742919922 , Acc: 0.0 | Time taken: 134.5910062789917 | |
Epoch: 0, Step: 0, Loss: 19.968788146972656 , Acc: 0.0 | Time taken: 110.49465250968933 | |
Epoch: 0, Step: 0, Loss: 11.780370712280273 , Acc: 0.0 | Time taken: 141.0984127521515 | |
Epoch: 0, Step: 0, Loss: 20.118988037109375 , Acc: 0.0 | Time taken: 140.64971256256104 | |
Epoch: 0, Step: 0, Loss: 11.584511756896973 , Acc: 0.0 | Time taken: 197.5575487613678 | |
Epoch: 0, Step: 0, Loss: 19.57272720336914 , Acc: 0.0 | Time taken: 140.49980330467224 | |
Epoch: 0, Step: 0, Loss: 12.053749084472656 , Acc: 0.0 | Time taken: 144.46794605255127 | |
Epoch: 0, Step: 0, Loss: 11.775741577148438 , Acc: 0.0 | Time taken: 197.6045639514923 | |
Epoch: 0, Step: 0, Loss: 19.984542846679688 , Acc: 0.0 | Time taken: 195.34999990463257 | |
Epoch: 0, Step: 0, Loss: 19.984024047851562 , Acc: 0.0 | Time taken: 200.05970001220703 | |
Epoch: 0, Step: 0, Loss: 14.514179229736328 , Acc: 0.0 | Time taken: 122.13053178787231 | |
Epoch: 0, Step: 0, Loss: 20.206199645996094 , Acc: 0.0 | Time taken: 167.89329314231873 | |
Epoch: 0, Step: 0, Loss: 19.871627807617188 , Acc: 0.0 | Time taken: 139.52446031570435 | |
Epoch: 0, Step: 0, Loss: 19.639116287231445 , Acc: 0.0 | Time taken: 141.90000414848328 | |
Epoch: 0, Step: 0, Loss: 19.481441497802734 , Acc: 0.0 | Time taken: 195.99468445777893 | |
Epoch: 0, Step: 0, Loss: 15.276798248291016 , Acc: 0.0 | Time taken: 155.8594193458557 | |
Epoch: 0, Step: 0, Loss: 19.57255744934082 , Acc: 0.0 | Time taken: 196.9389283657074 | |
Epoch: 0, Step: 0, Loss: 19.737197875976562 , Acc: 0.171875 | Time taken: 198.97129273414612 | |
Epoch: 0, Step: 0, Loss: 20.013158798217773 , Acc: 0.0 | Time taken: 139.22712898254395 | |
Epoch: 0, Step: 0, Loss: 19.818248748779297 , Acc: 0.0 | Time taken: 176.25659894943237 | |
Epoch: 0, Step: 0, Loss: 19.889266967773438 , Acc: 0.0 | Time taken: 202.01902174949646 | |
Epoch: 0, Step: 0, Loss: 19.898035049438477 , Acc: 0.0 | Time taken: 259.3313765525818 | |
Epoch: 0, Step: 0, Loss: 19.463672637939453 , Acc: 0.0 | Time taken: 136.95128917694092 | |
Epoch: 0, Step: 0, Loss: 11.808197021484375 , Acc: 0.0 | Time taken: 192.862895488739 | |
Epoch: 0, Step: 0, Loss: 16.529481887817383 , Acc: 0.0 | Time taken: 140.8465394973755 | |
Epoch: 0, Step: 0, Loss: 19.714950561523438 , Acc: 0.15625 | Time taken: 142.33827781677246 | |
Epoch: 0, Step: 0, Loss: 19.99258041381836 , Acc: 0.0 | Time taken: 141.23065185546875 | |
Epoch: 0, Step: 0, Loss: 19.920970916748047 , Acc: 0.0 | Time taken: 141.70846104621887 | |
Epoch: 0, Step: 0, Loss: 19.153942108154297 , Acc: 0.0 | Time taken: 203.262188911438 | |
Epoch: 0, Step: 0, Loss: 20.039621353149414 , Acc: 0.0 | Time taken: 195.4386613368988 | |
Epoch: 0, Step: 0, Loss: 19.778423309326172 , Acc: 0.0 | Time taken: 141.32375192642212 | |
Epoch: 0, Step: 0, Loss: 17.223506927490234 , Acc: 0.0 | Time taken: 149.77710819244385 | |
Epoch: 0, Step: 0, Loss: 20.210657119750977 , Acc: 0.0 | Time taken: 198.67266035079956 | |
Epoch: 0, Step: 0, Loss: 19.5445556640625 , Acc: 0.0 | Time taken: 197.39984917640686 | |
Epoch: 0, Step: 0, Loss: 19.591524124145508 , Acc: 0.0 | Time taken: 132.8765754699707 | |
Epoch: 0, Step: 0, Loss: 15.062592506408691 , Acc: 0.0 | Time taken: 115.13482189178467 | |
Epoch: 0, Step: 0, Loss: 17.355504989624023 , Acc: 0.0 | Time taken: 157.26107096672058 | |
Epoch: 0, Step: 0, Loss: 19.912269592285156 , Acc: 0.0 | Time taken: 145.35011839866638 | |
Epoch: 0, Step: 0, Loss: 18.666757583618164 , Acc: 0.0 | Time taken: 203.18580055236816 | |
Epoch: 0, Step: 0, Loss: 19.93193817138672 , Acc: 0.0 | Time taken: 137.0443925857544 | |
Epoch: 0, Step: 0, Loss: 11.852392196655273 , Acc: 0.0 | Time taken: 136.47687602043152 | |
Epoch: 0, Step: 0, Loss: 11.808197021484375 , Acc: 0.0 | Time taken: 147.81792163848877 | |
Epoch: 0, Step: 0, Loss: 19.73470115661621 , Acc: 0.0 | Time taken: 201.50432968139648 | |
Epoch: 0, Step: 0, Loss: 19.5964412689209 , Acc: 0.0 | Time taken: 196.0880868434906 | |
Epoch: 0, Step: 0, Loss: 19.7924861907959 , Acc: 0.0 | Time taken: 191.85613584518433 | |
Epoch: 0, Step: 0, Loss: 15.382076263427734 , Acc: 0.0 | Time taken: 137.6862621307373 | |
Epoch: 0, Step: 0, Loss: 15.616395950317383 , Acc: 0.0 | Time taken: 139.78927755355835 | |
Epoch: 0, Step: 0, Loss: 19.659992218017578 , Acc: 0.0 | Time taken: 116.17348623275757 | |
Epoch: 0, Step: 0, Loss: 15.188693046569824 , Acc: 0.0 | Time taken: 105.882239818573 | |
Epoch: 0, Step: 0, Loss: 11.80894660949707 , Acc: 0.0 | Time taken: 198.17706179618835 | |
Epoch: 0, Step: 0, Loss: 18.985166549682617 , Acc: 0.0 | Time taken: 138.62101411819458 | |
Epoch: 0, Step: 0, Loss: 14.655655860900879 , Acc: 0.0 | Time taken: 143.02500200271606 | |
Epoch: 0, Step: 0, Loss: 19.696178436279297 , Acc: 0.0 | Time taken: 138.1425290107727 | |
Epoch: 0, Step: 0, Loss: 19.69489860534668 , Acc: 0.0 | Time taken: 122.9480459690094 | |
Epoch: 0, Step: 0, Loss: 18.925922393798828 , Acc: 0.0 | Time taken: 145.2811713218689 | |
Epoch: 0, Step: 0, Loss: 19.63861656188965 , Acc: 0.0 | Time taken: 194.4568736553192 | |
Epoch: 0, Step: 0, Loss: 12.012990951538086 , Acc: 0.0 | Time taken: 140.72496008872986 | |
Epoch: 0, Step: 0, Loss: 15.163640975952148 , Acc: 0.0 | Time taken: 140.3770787715912 | |
Epoch: 0, Step: 0, Loss: 14.655655860900879 , Acc: 0.0 | Time taken: 162.99632287025452 | |
Epoch: 0, Step: 0, Loss: 19.677093505859375 , Acc: 0.0 | Time taken: 198.63446235656738 | |
Epoch: 0, Step: 0, Loss: 11.817590713500977 , Acc: 0.0 | Time taken: 201.51597785949707 | |
Epoch: 0, Step: 0, Loss: 19.68549346923828 , Acc: 0.0 | Time taken: 137.67197608947754 | |
Epoch: 0, Step: 0, Loss: 19.550180435180664 , Acc: 0.03125 | Time taken: 198.74911260604858 | |
Epoch: 0, Step: 0, Loss: 19.981128692626953 , Acc: 0.0 | Time taken: 198.22725701332092 | |
Epoch: 0, Step: 0, Loss: 11.78958797454834 , Acc: 0.0 | Time taken: 199.5324022769928 | |
Epoch: 0, Step: 0, Loss: 20.118988037109375 , Acc: 0.0 | Time taken: 170.49751162528992 | |
Epoch: 0, Step: 0, Loss: 15.868463516235352 , Acc: 0.0 | Time taken: 197.94081830978394 | |
Epoch: 0, Step: 0, Loss: 11.527729034423828 , Acc: 0.0 | Time taken: 137.81649565696716 | |
Epoch: 0, Step: 0, Loss: 19.56537437438965 , Acc: 0.0 | Time taken: 140.21940732002258 | |
Epoch: 0, Step: 0, Loss: 11.804129600524902 , Acc: 0.0 | Time taken: 199.34225797653198 | |
Epoch: 0, Step: 0, Loss: 19.72756576538086 , Acc: 0.0 | Time taken: 198.1713945865631 | |
Epoch: 0, Step: 0, Loss: 19.859020233154297 , Acc: 0.0 | Time taken: 196.63833713531494 | |
Epoch: 0, Step: 0, Loss: 19.952178955078125 , Acc: 0.0 | Time taken: 200.23561024665833 | |
Epoch: 0, Step: 0, Loss: 11.794713973999023 , Acc: 0.0 | Time taken: 199.07277727127075 | |
Epoch: 0, Step: 0, Loss: 19.912269592285156 , Acc: 0.0 | Time taken: 200.44544053077698 | |
Epoch: 0, Step: 0, Loss: 19.771989822387695 , Acc: 0.0 | Time taken: 139.4658329486847 | |
Epoch: 0, Step: 0, Loss: 15.667868614196777 , Acc: 0.0 | Time taken: 135.7098581790924 | |
Epoch: 0, Step: 0, Loss: 19.778484344482422 , Acc: 0.0 | Time taken: 150.74809956550598 | |
Epoch: 0, Step: 0, Loss: 15.883161544799805 , Acc: 0.0 | Time taken: 130.989337682724 | |
Epoch: 0, Step: 0, Loss: 19.87149429321289 , Acc: 0.0 | Time taken: 140.71923184394836 | |
Epoch: 0, Step: 0, Loss: 11.761435508728027 , Acc: 0.0 | Time taken: 195.58116483688354 | |
Epoch: 0, Step: 50, Loss: 20.038393020629883 , Acc: 0.0 | Time taken: 3.5011448860168457 | |
Epoch: 0, Step: 50, Loss: 18.061710357666016 , Acc: 0.0 | Time taken: 3.50065016746521 | |
Epoch: 0, Step: 50, Loss: 24.769943237304688 , Acc: 0.0 | Time taken: 3.240126132965088 | |
Epoch: 0, Step: 50, Loss: 17.05221176147461 , Acc: 0.0 | Time taken: 3.1633899211883545 | |
Epoch: 0, Step: 50, Loss: 15.810155868530273 , Acc: 0.0 | Time taken: 3.499434471130371 | |
Epoch: 0, Step: 50, Loss: 13.472417831420898 , Acc: 0.921875 | Time taken: 3.346092700958252 | |
Epoch: 0, Step: 50, Loss: 16.903121948242188 , Acc: 0.0 | Time taken: 3.4975948333740234 | |
Epoch: 0, Step: 50, Loss: 28.167705535888672 , Acc: 0.0 | Time taken: 3.5026490688323975 | |
Epoch: 0, Step: 50, Loss: 17.760799407958984 , Acc: 0.0 | Time taken: 3.4991610050201416 | |
Epoch: 0, Step: 50, Loss: 11.970874786376953 , Acc: 0.0 | Time taken: 3.504546642303467 | |
Epoch: 0, Step: 50, Loss: 16.397994995117188 , Acc: 0.0 | Time taken: 3.494872808456421 | |
Epoch: 0, Step: 50, Loss: 14.337437629699707 , Acc: 0.1875 | Time taken: 3.499532699584961 | |
Epoch: 0, Step: 50, Loss: 12.147560119628906 , Acc: 0.078125 | Time taken: 3.495561122894287 | |
Epoch: 0, Step: 50, Loss: 27.81817626953125 , Acc: 0.0 | Time taken: 3.495687484741211 | |
Epoch: 0, Step: 50, Loss: 14.962888717651367 , Acc: 0.0 | Time taken: 3.4939026832580566 | |
Epoch: 0, Step: 50, Loss: 16.693016052246094 , Acc: 0.0 | Time taken: 3.4954442977905273 | |
Epoch: 0, Step: 50, Loss: 16.223472595214844 , Acc: 0.0 | Time taken: 3.494844675064087 | |
Epoch: 0, Step: 50, Loss: 16.743656158447266 , Acc: 0.0 | Time taken: 3.502980947494507 | |
Epoch: 0, Step: 50, Loss: 8.54758071899414 , Acc: 0.546875 | Time taken: 3.4361419677734375 | |
Epoch: 0, Step: 50, Loss: 21.364377975463867 , Acc: 0.0 | Time taken: 3.4961352348327637 | |
Epoch: 0, Step: 50, Loss: 24.025676727294922 , Acc: 0.0 | Time taken: 3.502211809158325 | |
Epoch: 0, Step: 50, Loss: 16.617528915405273 , Acc: 0.0 | Time taken: 3.503201723098755 | |
Epoch: 0, Step: 50, Loss: 18.41788673400879 , Acc: 0.0 | Time taken: 3.1850054264068604 | |
Epoch: 0, Step: 50, Loss: 14.301187515258789 , Acc: 0.265625 | Time taken: 3.496520757675171 | |
Epoch: 0, Step: 50, Loss: 16.27230453491211 , Acc: 0.0 | Time taken: 3.503924608230591 | |
Epoch: 0, Step: 50, Loss: 16.859397888183594 , Acc: 0.0 | Time taken: 3.4807605743408203 | |
Epoch: 0, Step: 50, Loss: 14.04751205444336 , Acc: 0.59375 | Time taken: 3.503180980682373 | |
Epoch: 0, Step: 50, Loss: 15.487218856811523 , Acc: 0.0 | Time taken: 3.4923107624053955 | |
Epoch: 0, Step: 50, Loss: 23.8161678314209 , Acc: 0.0 | Time taken: 3.4811079502105713 | |
Epoch: 0, Step: 50, Loss: 15.456480979919434 , Acc: 0.0 | Time taken: 3.3377275466918945 | |
Epoch: 0, Step: 50, Loss: 22.720989227294922 , Acc: 0.0 | Time taken: 3.5009539127349854 | |
Epoch: 0, Step: 50, Loss: 8.577889442443848 , Acc: 0.046875 | Time taken: 3.4999911785125732 | |
Epoch: 0, Step: 50, Loss: 14.469091415405273 , Acc: 0.0 | Time taken: 3.5026357173919678 | |
Epoch: 0, Step: 50, Loss: 19.889137268066406 , Acc: 0.0 | Time taken: 3.5004515647888184 | |
Epoch: 0, Step: 50, Loss: 11.64035701751709 , Acc: 0.0 | Time taken: 3.2225239276885986 | |
Epoch: 0, Step: 50, Loss: 28.77550506591797 , Acc: 0.0 | Time taken: 3.493682861328125 | |
Epoch: 0, Step: 50, Loss: 22.211091995239258 , Acc: 0.0 | Time taken: 3.496549367904663 | |
Epoch: 0, Step: 50, Loss: 13.35706615447998 , Acc: 0.828125 | Time taken: 3.4952545166015625 | |
Epoch: 0, Step: 50, Loss: 18.60054588317871 , Acc: 0.0 | Time taken: 3.5019266605377197 | |
Epoch: 0, Step: 50, Loss: 16.35378646850586 , Acc: 0.0 | Time taken: 3.4917452335357666 | |
Epoch: 0, Step: 50, Loss: 15.054426193237305 , Acc: 0.0 | Time taken: 3.5027215480804443 | |
Epoch: 0, Step: 50, Loss: 8.577889442443848 , Acc: 0.046875 | Time taken: 3.500481605529785 | |
Epoch: 0, Step: 50, Loss: 15.349031448364258 , Acc: 0.0 | Time taken: 3.358882427215576 | |
Epoch: 0, Step: 50, Loss: 14.513076782226562 , Acc: 0.5 | Time taken: 3.496181011199951 | |
Epoch: 0, Step: 50, Loss: 8.390466690063477 , Acc: 0.0 | Time taken: 3.502443313598633 | |
Epoch: 0, Step: 50, Loss: 23.744415283203125 , Acc: 0.0 | Time taken: 3.4960215091705322 | |
Epoch: 0, Step: 50, Loss: 8.993657112121582 , Acc: 0.0 | Time taken: 3.503264904022217 | |
Epoch: 0, Step: 50, Loss: 14.99350357055664 , Acc: 0.0 | Time taken: 3.502300500869751 | |
Epoch: 0, Step: 50, Loss: 16.02570915222168 , Acc: 0.0 | Time taken: 3.510237455368042 | |
Epoch: 0, Step: 50, Loss: 16.632579803466797 , Acc: 0.0 | Time taken: 3.503570079803467 | |
Epoch: 0, Step: 50, Loss: 25.662242889404297 , Acc: 0.0 | Time taken: 3.523710012435913 | |
Epoch: 0, Step: 50, Loss: 16.535140991210938 , Acc: 0.0 | Time taken: 3.5236728191375732 | |
Epoch: 0, Step: 50, Loss: 28.171875 , Acc: 0.0 | Time taken: 3.523655414581299 | |
Epoch: 0, Step: 50, Loss: 19.19924545288086 , Acc: 0.0 | Time taken: 3.293935537338257 | |
Epoch: 0, Step: 50, Loss: 16.626239776611328 , Acc: 0.0 | Time taken: 3.5202085971832275 | |
Epoch: 0, Step: 50, Loss: 16.700523376464844 , Acc: 0.0 | Time taken: 3.4052746295928955 | |
Epoch: 0, Step: 50, Loss: 10.576862335205078 , Acc: 0.234375 | Time taken: 3.488424062728882 | |
Epoch: 0, Step: 50, Loss: 15.670470237731934 , Acc: 0.0 | Time taken: 3.5229287147521973 | |
Epoch: 0, Step: 50, Loss: 15.471056938171387 , Acc: 0.0 | Time taken: 3.058933973312378 | |
Epoch: 0, Step: 50, Loss: 15.814111709594727 , Acc: 0.0 | Time taken: 3.525090217590332 | |
Epoch: 0, Step: 50, Loss: 17.715770721435547 , Acc: 0.0 | Time taken: 3.523406982421875 | |
Epoch: 0, Step: 50, Loss: 11.344772338867188 , Acc: 0.0 | Time taken: 3.509988307952881 | |
Epoch: 0, Step: 50, Loss: 21.345888137817383 , Acc: 0.0 | Time taken: 3.524320125579834 | |
Epoch: 0, Step: 50, Loss: 20.00176239013672 , Acc: 0.0 | Time taken: 3.5074329376220703 | |
Epoch: 0, Step: 50, Loss: 22.211091995239258 , Acc: 0.0 | Time taken: 3.5218594074249268 | |
Epoch: 0, Step: 50, Loss: 14.962888717651367 , Acc: 0.0 | Time taken: 3.523887872695923 | |
Epoch: 0, Step: 50, Loss: 17.250328063964844 , Acc: 0.0 | Time taken: 3.523099184036255 | |
Epoch: 0, Step: 50, Loss: 17.436492919921875 , Acc: 0.0 | Time taken: 3.523989200592041 | |
Epoch: 0, Step: 50, Loss: 16.661800384521484 , Acc: 0.0 | Time taken: 3.5239012241363525 | |
Epoch: 0, Step: 50, Loss: 16.880949020385742 , Acc: 0.0 | Time taken: 3.5219814777374268 | |
Epoch: 0, Step: 50, Loss: 15.777007102966309 , Acc: 0.0 | Time taken: 3.519026517868042 | |
Epoch: 0, Step: 50, Loss: 24.054317474365234 , Acc: 0.0 | Time taken: 3.295353651046753 | |
Epoch: 0, Step: 50, Loss: 14.469091415405273 , Acc: 0.0 | Time taken: 3.521101713180542 | |
Epoch: 0, Step: 50, Loss: 14.86700439453125 , Acc: 0.359375 | Time taken: 3.507798194885254 | |
Epoch: 0, Step: 50, Loss: 15.369190216064453 , Acc: 0.0 | Time taken: 3.5268609523773193 | |
Epoch: 0, Step: 50, Loss: 15.762784957885742 , Acc: 0.0 | Time taken: 3.5224668979644775 | |
Epoch: 0, Step: 50, Loss: 9.163860321044922 , Acc: 0.0 | Time taken: 3.521397113800049 | |
Epoch: 0, Step: 50, Loss: 15.609086990356445 , Acc: 0.0 | Time taken: 3.518476963043213 | |
Epoch: 0, Step: 50, Loss: 8.54758071899414 , Acc: 0.546875 | Time taken: 3.2743489742279053 | |
Epoch: 0, Step: 50, Loss: 17.760753631591797 , Acc: 0.0 | Time taken: 3.5225236415863037 | |
Epoch: 0, Step: 50, Loss: 21.36757469177246 , Acc: 0.0 | Time taken: 3.5263290405273438 | |
Epoch: 0, Step: 50, Loss: 24.625385284423828 , Acc: 0.0 | Time taken: 3.232883930206299 | |
Epoch: 0, Step: 50, Loss: 19.349485397338867 , Acc: 0.0 | Time taken: 3.5224673748016357 | |
Epoch: 0, Step: 50, Loss: 16.285114288330078 , Acc: 0.0 | Time taken: 3.5252325534820557 | |
Epoch: 0, Step: 50, Loss: 15.935747146606445 , Acc: 0.0 | Time taken: 3.489647150039673 | |
Epoch: 0, Step: 50, Loss: 13.042746543884277 , Acc: 0.15625 | Time taken: 3.3556835651397705 | |
Epoch: 0, Step: 50, Loss: 22.479259490966797 , Acc: 0.0 | Time taken: 3.4429924488067627 | |
Epoch: 0, Step: 50, Loss: 13.412002563476562 , Acc: 0.9375 | Time taken: 3.525402545928955 | |
Epoch: 0, Step: 50, Loss: 8.283740997314453 , Acc: 1.0 | Time taken: 3.5308353900909424 | |
Epoch: 0, Step: 50, Loss: 16.06515121459961 , Acc: 0.0 | Time taken: 3.5328478813171387 | |
Epoch: 0, Step: 50, Loss: 15.246273040771484 , Acc: 0.0 | Time taken: 3.529102087020874 | |
Epoch: 0, Step: 50, Loss: 8.567059516906738 , Acc: 0.125 | Time taken: 3.4655728340148926 | |
Epoch: 0, Step: 50, Loss: 15.020666122436523 , Acc: 0.046875 | Time taken: 3.534762144088745 | |
Epoch: 0, Step: 50, Loss: 10.266606330871582 , Acc: 0.0 | Time taken: 3.5175764560699463 | |
Epoch: 0, Step: 50, Loss: 20.12688446044922 , Acc: 0.0 | Time taken: 3.7150871753692627 | |
Epoch: 0, Step: 50, Loss: 15.483797073364258 , Acc: 0.0 | Time taken: 3.550638437271118 | |
gpu-st-p4d-24xlarge-51:30345:30909 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-150.ec2.internal<45466> | |
gpu-st-p4d-24xlarge-51:30345:30909 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-51:30345:30909 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-51:30345:30909 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-51:30345:30909 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-51:30345:30909 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-53:26988:27564 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-150.ec2.internal<48964> | |
gpu-st-p4d-24xlarge-53:26988:27564 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-53:26988:27564 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-53:26988:27564 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-53:26988:27564 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-53:26988:27564 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-54:29518:30123 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-150.ec2.internal<52200> | |
gpu-st-p4d-24xlarge-54:29518:30123 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-54:29518:30123 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-54:29518:30123 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-54:29518:30123 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-54:29518:30123 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29154 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29155 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29157 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29158 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29159 closing signal SIGTERM | |
gpu-st-p4d-24xlarge-54:29519:30125 [0] misc/socket.cc:503 NCCL WARN Net : Call to recv from 172.31.239.214<43187> failed : Connection reset by peer | |
gpu-st-p4d-24xlarge-54:29519:30125 [0] NCCL INFO misc/socket.cc:520 -> 2 | |
gpu-st-p4d-24xlarge-54:29519:30125 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-54:29519:30125 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-54:29519:30125 [0] NCCL INFO transport/net.cc:870 -> 2 | |
gpu-st-p4d-24xlarge-54:29519:30125 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-54:29519:30125 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-52:39401:39975 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-233-244.ec2.internal<47846> | |
gpu-st-p4d-24xlarge-52:39401:39975 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-52:39401:39975 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-52:39401:39975 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-52:39401:39975 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-52:39401:39975 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-51:30345:30643 [0] NCCL INFO comm 0x7fc5dc009010 rank 45 nranks 96 cudaDev 5 busId 901d0 - Abort COMPLETE | |
gpu-st-p4d-24xlarge-53:26988:27286 [0] NCCL INFO comm 0x7fbf94009010 rank 56 nranks 96 cudaDev 0 busId 101c0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-56:28176:29026 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-229-154.ec2.internal<37810> | |
gpu-st-p4d-24xlarge-56:28176:29026 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-56:28176:29026 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-56:28176:29026 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-56:28176:29026 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-56:28176:29026 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-53:26989:27566 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-229-154.ec2.internal<40624> | |
gpu-st-p4d-24xlarge-53:26989:27566 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-53:26989:27566 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-53:26989:27566 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-53:26989:27566 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-53:26989:27566 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-54:29518:29815 [0] NCCL INFO comm 0x7fa29c009010 rank 64 nranks 96 cudaDev 0 busId 101c0 - Abort COMPLETE | |
gpu-st-p4d-24xlarge-54:29514:30126 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-150.ec2.internal<58940> | |
gpu-st-p4d-24xlarge-54:29514:30126 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-54:29514:30126 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-54:29514:30126 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-54:29514:30126 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-54:29514:30126 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-53:26986:27567 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-150.ec2.internal<36242> | |
gpu-st-p4d-24xlarge-53:26986:27567 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-53:26986:27567 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-53:26986:27567 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-53:26986:27567 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-53:26986:27567 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-51:30343:30906 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-150.ec2.internal<55604> | |
gpu-st-p4d-24xlarge-51:30343:30906 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-51:30343:30906 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-51:30343:30906 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-51:30343:30906 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-51:30343:30906 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-57:27793:28456 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-150.ec2.internal<47998> | |
gpu-st-p4d-24xlarge-57:27793:28456 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-57:27793:28456 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-57:27793:28456 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-57:27793:28456 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-57:27793:28456 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-56:28175:29024 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-150.ec2.internal<45056> | |
gpu-st-p4d-24xlarge-56:28175:29024 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-56:28175:29024 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-56:28175:29024 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-56:28175:29024 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-56:28175:29024 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-57:27791:28460 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-150.ec2.internal<33796> | |
gpu-st-p4d-24xlarge-57:27791:28460 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-57:27791:28460 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-57:27791:28460 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-57:27791:28460 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-57:27791:28460 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-56:28171:29027 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-150.ec2.internal<56018> | |
gpu-st-p4d-24xlarge-56:28171:29027 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-56:28171:29027 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-56:28171:29027 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-56:28171:29027 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-56:28171:29027 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 2 (pid: 29156) of binary: /opt/conda/bin/python | |
gpu-st-p4d-24xlarge-54:29519:29807 [0] NCCL INFO comm 0x7f3130009010 rank 65 nranks 96 cudaDev 1 busId 101d0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-52:39402:39977 [0] misc/socket.cc:503 NCCL WARN Net : Call to recv from 172.31.227.208<54377> failed : Broken pipe | |
gpu-st-p4d-24xlarge-52:39402:39977 [0] NCCL INFO misc/socket.cc:520 -> 2 | |
gpu-st-p4d-24xlarge-52:39402:39977 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-52:39402:39977 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-52:39402:39977 [0] NCCL INFO transport/net.cc:870 -> 2 | |
gpu-st-p4d-24xlarge-52:39402:39977 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-52:39402:39977 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29514 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29515 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29516 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29517 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29519 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30342 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30343 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30344 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30346 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30347 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 26986 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 26987 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 26989 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 26990 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 26991 closing signal SIGTERM | |
gpu-st-p4d-24xlarge-58:27723:28290 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-227-130.ec2.internal<34164> | |
gpu-st-p4d-24xlarge-58:27723:28290 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-58:27723:28290 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-58:27723:28290 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-58:27723:28290 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-58:27723:28290 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-56:28176:28618 [0] NCCL INFO comm 0x7f41ac009010 rank 77 nranks 96 cudaDev 5 busId 901d0 - Abort COMPLETE | |
gpu-st-p4d-24xlarge-56:28175:28567 [0] NCCL INFO comm 0x7f2810009010 rank 76 nranks 96 cudaDev 4 busId 901c0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::runtime_errorstd::runtime_error' | |
' | |
what(): what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-44:28069:28967 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-230-245.ec2.internal<47806> | |
gpu-st-p4d-24xlarge-44:28069:28967 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-44:28069:28967 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-44:28069:28967 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-44:28069:28967 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-44:28069:28967 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-52:39401:39699 [0] NCCL INFO comm 0x7f9220009010 rank 52 nranks 96 cudaDev 4 busId 901c0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-56:28172:29025 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-229-154.ec2.internal<42724> | |
gpu-st-p4d-24xlarge-56:28172:29025 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-56:28172:29025 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-56:28172:29025 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-56:28172:29025 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-56:28172:29025 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-58:27724:28292 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-229-205.ec2.internal<40844> | |
gpu-st-p4d-24xlarge-58:27724:28292 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-58:27724:28292 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-58:27724:28292 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-58:27724:28292 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-58:27724:28292 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-57:27793:28095 [0] NCCL INFO comm 0x7f3cc0009010 rank 80 nranks 96 cudaDev 0 busId 101c0 - Abort COMPLETE | |
gpu-st-p4d-24xlarge-50:29215:29785 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-233-244.ec2.internal<51948> | |
gpu-st-p4d-24xlarge-50:29215:29785 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-50:29215:29785 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-47:30190:30763 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-233-244.ec2.internal<42136> | |
gpu-st-p4d-24xlarge-47:30190:30763 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-47:30190:30763 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-47:30190:30763 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-47:30190:30763 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-50:29215:29785 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-50:29215:29785 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-50:29215:29785 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-47:30190:30763 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-59:27920:28696 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-233-244.ec2.internal<54450> | |
gpu-st-p4d-24xlarge-59:27920:28696 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-59:27920:28696 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-59:27920:28696 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-59:27920:28696 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-59:27920:28696 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-52:39397:39978 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-233-244.ec2.internal<44236> | |
gpu-st-p4d-24xlarge-52:39397:39978 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-52:39397:39978 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-52:39397:39978 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-52:39397:39978 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-52:39397:39978 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-50:29211:29788 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-233-244.ec2.internal<58664> | |
gpu-st-p4d-24xlarge-50:29211:29788 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-50:29211:29788 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-47:30188:30762 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-233-244.ec2.internal<39658> | |
gpu-st-p4d-24xlarge-47:30188:30762 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-47:30188:30762 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-47:30188:30762 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-50:29211:29788 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-50:29211:29788 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-50:29211:29788 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-47:30188:30762 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-47:30188:30762 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-59:27918:28695 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-233-244.ec2.internal<58630> | |
gpu-st-p4d-24xlarge-59:27918:28695 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-59:27918:28695 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-59:27918:28695 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-59:27918:28695 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-59:27918:28695 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 4 (pid: 29518) of binary: /opt/conda/bin/python | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 2 (pid: 26988) of binary: /opt/conda/bin/python | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 3 (pid: 30345) of binary: /opt/conda/bin/python | |
gpu-st-p4d-24xlarge-48:30195:30760 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-230-245.ec2.internal<49974> | |
gpu-st-p4d-24xlarge-48:30195:30760 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-48:30195:30760 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-48:30195:30760 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-48:30195:30760 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-48:30195:30760 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-52:39402:39702 [0] NCCL INFO comm 0x7f9d04009010 rank 53 nranks 96 cudaDev 5 busId 901d0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 28171 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 28172 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 28173 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 28174 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 39397 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 39398 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 39399 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 39400 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 39402 closing signal SIGTERM | |
gpu-st-p4d-24xlarge-50:29212:29786 [0] misc/socket.cc:503 NCCL WARN Net : Call to recv from 172.31.226.133<56915> failed : Connection reset by peer | |
gpu-st-p4d-24xlarge-50:29212:29786 [0] NCCL INFO misc/socket.cc:520 -> 2 | |
gpu-st-p4d-24xlarge-50:29212:29786 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-50:29212:29786 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-50:29212:29786 [0] NCCL INFO transport/net.cc:870 -> 2 | |
gpu-st-p4d-24xlarge-50:29212:29786 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-50:29212:29786 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-50:29216:29787 [0] misc/socket.cc:503 NCCL WARN Net : Call to recv from 172.31.234.3<54987> failed : Connection reset by peer | |
gpu-st-p4d-24xlarge-50:29216:29787 [0] NCCL INFO misc/socket.cc:520 -> 2 | |
gpu-st-p4d-24xlarge-50:29216:29787 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-50:29216:29787 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-50:29216:29787 [0] NCCL INFO transport/net.cc:870 -> 2 | |
gpu-st-p4d-24xlarge-50:29216:29787 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-50:29216:29787 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27791 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27792 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27794 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27795 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27796 closing signal SIGTERM | |
gpu-st-p4d-24xlarge-58:27723:28012 [0] NCCL INFO comm 0x7f866c009010 rank 88 nranks 96 cudaDev 0 busId 101c0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-58:27724:28025 [0] NCCL INFO comm 0x7f0b24009010 rank 89 nranks 96 cudaDev 1 busId 101d0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-58:27719:28293 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-227-130.ec2.internal<58756> | |
gpu-st-p4d-24xlarge-58:27719:28293 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-58:27719:28293 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-58:27719:28293 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-58:27719:28293 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-58:27719:28293 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-48:30191:30763 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-230-245.ec2.internal<51888> | |
gpu-st-p4d-24xlarge-48:30191:30763 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-48:30191:30763 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-48:30191:30763 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-48:30191:30763 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-48:30191:30763 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-44:28065:28969 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-230-245.ec2.internal<38130> | |
gpu-st-p4d-24xlarge-44:28065:28969 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-44:28065:28969 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-44:28065:28969 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-44:28065:28969 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-44:28065:28969 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 4 (pid: 28175) of binary: /opt/conda/bin/python | |
gpu-st-p4d-24xlarge-59:27920:28206 [0] NCCL INFO comm 0x7f6e6c009010 rank 92 nranks 96 cudaDev 4 busId 901c0 - Abort COMPLETE | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 4 (pid: 39401) of binary: /opt/conda/bin/python | |
gpu-st-p4d-24xlarge-45:30137:30702 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-78.ec2.internal<42102> | |
gpu-st-p4d-24xlarge-45:30137:30702 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-45:30137:30702 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-45:30137:30702 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-45:30137:30702 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-45:30137:30702 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-44:28069:28455 [0] NCCL INFO comm 0x7f7aac009010 rank 4 nranks 96 cudaDev 4 busId 901c0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-49:25778:26348 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-234-184.ec2.internal<39924> | |
gpu-st-p4d-24xlarge-49:25778:26348 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-49:25778:26348 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-49:25778:26348 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-49:25778:26348 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-49:25778:26348 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-48:30195:30488 [0] NCCL INFO comm 0x7f185c009010 rank 28 nranks 96 cudaDev 4 busId 901c0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-58:27720:28291 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-229-205.ec2.internal<33608> | |
gpu-st-p4d-24xlarge-58:27720:28291 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-58:27720:28291 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-58:27720:28291 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-58:27720:28291 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-58:27720:28291 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27719 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27720 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27721 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27722 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27724 closing signal SIGTERM | |
gpu-st-p4d-24xlarge-49:25779:26350 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-152.ec2.internal<50496> | |
gpu-st-p4d-24xlarge-49:25779:26350 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-49:25779:26350 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-49:25779:26350 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-49:25779:26350 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-49:25779:26350 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-50:29215:29514 [0] NCCL INFO comm 0x7fe408009010 rank 40 nranks 96 cudaDev 0 busId 101c0 - Abort COMPLETE | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 2 (pid: 27793) of binary: /opt/conda/bin/python | |
gpu-st-p4d-24xlarge-50:29216:29510 [0] NCCL INFO comm 0x7f3b88009010 rank 41 nranks 96 cudaDev 1 busId 101d0 - Abort COMPLETE | |
gpu-st-p4d-24xlarge-46:30085:30689 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-226-226.ec2.internal<48056> | |
gpu-st-p4d-24xlarge-46:30085:30689 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-46:30085:30689 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-46:30085:30689 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-46:30085:30689 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-46:30085:30689 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-47:30190:30482 [0] NCCL INFO comm 0x7f7bb4009010 rank 20 nranks 96 cudaDev 4 busId 901c0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30191 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30192 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30193 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30194 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30196 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27918 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27919 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27921 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27922 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 27923 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 28065 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 28066 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 28067 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 28068 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 28070 closing signal SIGTERM | |
gpu-st-p4d-24xlarge-46:30086:30691 [0] misc/socket.cc:503 NCCL WARN Net : Call to recv from 172.31.224.181<45511> failed : Connection reset by peer | |
gpu-st-p4d-24xlarge-46:30086:30691 [0] NCCL INFO misc/socket.cc:520 -> 2 | |
gpu-st-p4d-24xlarge-46:30086:30691 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-46:30086:30691 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-46:30086:30691 [0] NCCL INFO transport/net.cc:870 -> 2 | |
gpu-st-p4d-24xlarge-46:30086:30691 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-46:30086:30691 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30188 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30189 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30191 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30192 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30193 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29211 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29212 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29213 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 29214 closing signal SIGTERM | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 4 (pid: 27723) of binary: /opt/conda/bin/python | |
gpu-st-p4d-24xlarge-49:25779:26078 [0] NCCL INFO comm 0x7f080c009010 rank 33 nranks 96 cudaDev 1 busId 101d0 - Abort COMPLETE | |
gpu-st-p4d-24xlarge-49:25778:26081 [0] NCCL INFO comm 0x7f1324009010 rank 32 nranks 96 cudaDev 0 busId 101c0 - Abort COMPLETE | |
gpu-st-p4d-24xlarge-45:30138:30705 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-226-192.ec2.internal<46866> | |
gpu-st-p4d-24xlarge-45:30138:30705 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-45:30138:30705 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-45:30138:30705 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-45:30138:30705 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-45:30138:30705 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-46:30086:30416 [0] NCCL INFO comm 0x7f2de8009010 rank 17 nranks 96 cudaDev 1 busId 101d0 - Abort COMPLETE | |
gpu-st-p4d-24xlarge-46:30085:30427 [0] NCCL INFO comm 0x7f674c009010 rank 16 nranks 96 cudaDev 0 busId 101c0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::runtime_errorstd::runtime_error' | |
' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-49:25776:26349 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-234-184.ec2.internal<47822> | |
gpu-st-p4d-24xlarge-49:25776:26349 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-49:25776:26349 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-49:25776:26349 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-49:25776:26349 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-49:25776:26349 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-45:30135:30704 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-78.ec2.internal<35068> | |
gpu-st-p4d-24xlarge-45:30135:30704 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-45:30135:30704 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-45:30135:30704 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-45:30135:30704 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-45:30135:30704 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
gpu-st-p4d-24xlarge-46:30081:30692 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-234-184.ec2.internal<50118> | |
gpu-st-p4d-24xlarge-46:30081:30692 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-46:30081:30692 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-46:30081:30692 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-46:30081:30692 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-46:30081:30692 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 4 (pid: 30195) of binary: /opt/conda/bin/python | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 4 (pid: 28069) of binary: /opt/conda/bin/python | |
gpu-st-p4d-24xlarge-45:30137:30431 [0] NCCL INFO comm 0x7f4de8009010 rank 8 nranks 96 cudaDev 0 busId 101c0 - Abort COMPLETE | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
gpu-st-p4d-24xlarge-49:25777:26351 [0] misc/socket.cc:523 NCCL WARN Net : Connection closed by remote peer ip-172-31-231-152.ec2.internal<44236> | |
gpu-st-p4d-24xlarge-49:25777:26351 [0] NCCL INFO transport/net_socket.cc:493 -> 2 | |
gpu-st-p4d-24xlarge-49:25777:26351 [0] NCCL INFO include/net.h:32 -> 2 | |
gpu-st-p4d-24xlarge-49:25777:26351 [0] NCCL INFO transport/net.cc:996 -> 2 | |
gpu-st-p4d-24xlarge-49:25777:26351 [0] NCCL INFO proxy.cc:494 -> 2 | |
gpu-st-p4d-24xlarge-49:25777:26351 [0] NCCL INFO proxy.cc:614 -> 2 [Proxy Thread] | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 25776 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 25777 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 25779 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 25780 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 25781 closing signal SIGTERM | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 2 (pid: 27920) of binary: /opt/conda/bin/python | |
gpu-st-p4d-24xlarge-45:30138:30426 [0] NCCL INFO comm 0x7fb2c4009010 rank 9 nranks 96 cudaDev 1 busId 101d0 - Abort COMPLETE | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 4 (pid: 29215) of binary: /opt/conda/bin/python | |
[E ProcessGroupNCCL.cpp:480] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down. | |
terminate called after throwing an instance of 'std::runtime_error' | |
what(): NCCL communicator encountered error set by ProcessGroupNCCL: NCCL error: unhandled system error, NCCL version 2.12.12 | |
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer. | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 2 (pid: 30190) of binary: /opt/conda/bin/python | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30135 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30136 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30138 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30139 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30145 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30081 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30082 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30083 closing signal SIGTERM | |
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 30084 closing signal SIGTERM | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 2 (pid: 25778) of binary: /opt/conda/bin/python | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 4 (pid: 30085) of binary: /opt/conda/bin/python | |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 2 (pid: 30137) of binary: /opt/conda/bin/python | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:16 | |
host : gpu-st-p4d-24xlarge-54.hpc-1click-prod450.pcluster | |
rank : 64 (local_rank: 4) | |
exitcode : -6 (pid: 29518) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 29518 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
[1]: | |
time : 2022-08-26_22:41:31 | |
host : gpu-st-p4d-24xlarge-56.hpc-1click-prod450.pcluster | |
rank : 77 (local_rank: 5) | |
exitcode : -6 (pid: 28176) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 28176 | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:31 | |
host : gpu-st-p4d-24xlarge-56.hpc-1click-prod450.pcluster | |
rank : 76 (local_rank: 4) | |
exitcode : -6 (pid: 28175) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 28175 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:37 | |
host : gpu-st-p4d-24xlarge-57.hpc-1click-prod450.pcluster | |
rank : 80 (local_rank: 2) | |
exitcode : -6 (pid: 27793) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 27793 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:47 | |
host : gpu-st-p4d-24xlarge-58.hpc-1click-prod450.pcluster | |
rank : 88 (local_rank: 4) | |
exitcode : -6 (pid: 27723) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 27723 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:51 | |
host : gpu-st-p4d-24xlarge-59.hpc-1click-prod450.pcluster | |
rank : 92 (local_rank: 2) | |
exitcode : -6 (pid: 27920) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 27920 | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:56 | |
host : gpu-st-p4d-24xlarge-47.hpc-1click-prod450.pcluster | |
rank : 20 (local_rank: 2) | |
exitcode : -6 (pid: 30190) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 30190 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
[1]: | |
time : 2022-08-26_22:41:56 | |
host : gpu-st-p4d-24xlarge-50.hpc-1click-prod450.pcluster | |
rank : 41 (local_rank: 5) | |
exitcode : -6 (pid: 29216) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 29216 | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:56 | |
host : gpu-st-p4d-24xlarge-50.hpc-1click-prod450.pcluster | |
rank : 40 (local_rank: 4) | |
exitcode : -6 (pid: 29215) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 29215 | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:16 | |
host : gpu-st-p4d-24xlarge-53.hpc-1click-prod450.pcluster | |
rank : 56 (local_rank: 2) | |
exitcode : -6 (pid: 26988) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 26988 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:51 | |
host : gpu-st-p4d-24xlarge-48.hpc-1click-prod450.pcluster | |
rank : 28 (local_rank: 4) | |
exitcode : -6 (pid: 30195) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 30195 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:52 | |
host : gpu-st-p4d-24xlarge-44.hpc-1click-prod450.pcluster | |
rank : 4 (local_rank: 4) | |
exitcode : -6 (pid: 28069) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 28069 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:42:06 | |
host : gpu-st-p4d-24xlarge-49.hpc-1click-prod450.pcluster | |
rank : 32 (local_rank: 2) | |
exitcode : -6 (pid: 25778) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 25778 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:42:11 | |
host : gpu-st-p4d-24xlarge-45.hpc-1click-prod450.pcluster | |
rank : 8 (local_rank: 2) | |
exitcode : -6 (pid: 30137) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 30137 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
[1]: | |
time : 2022-08-26_22:42:11 | |
host : gpu-st-p4d-24xlarge-46.hpc-1click-prod450.pcluster | |
rank : 17 (local_rank: 5) | |
exitcode : -6 (pid: 30086) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 30086 | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:42:11 | |
host : gpu-st-p4d-24xlarge-46.hpc-1click-prod450.pcluster | |
rank : 16 (local_rank: 4) | |
exitcode : -6 (pid: 30085) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 30085 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:40:52 | |
host : gpu-st-p4d-24xlarge-55.hpc-1click-prod450.pcluster | |
rank : 68 (local_rank: 2) | |
exitcode : -9 (pid: 29156) | |
error_file: <N/A> | |
traceback : Signal 9 (SIGKILL) received by PID 29156 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:16 | |
host : gpu-st-p4d-24xlarge-51.hpc-1click-prod450.pcluster | |
rank : 45 (local_rank: 3) | |
exitcode : -6 (pid: 30345) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 30345 | |
============================================================ | |
Traceback (most recent call last): | |
File "/opt/conda/bin/torchrun", line 33, in <module> | |
sys.exit(load_entry_point('torch==1.13.0a0+08820cb', 'console_scripts', 'torchrun')()) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper | |
return f(*args, **kwargs) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main | |
run(args) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run | |
elastic_launch( | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ | |
return launch_agent(self._config, self._entrypoint, list(args)) | |
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent | |
raise ChildFailedError( | |
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: | |
============================================================ | |
scripts/ddp_convnext.py FAILED | |
------------------------------------------------------------ | |
Failures: | |
<NO_OTHER_FAILURES> | |
------------------------------------------------------------ | |
Root Cause (first observed failure): | |
[0]: | |
time : 2022-08-26_22:41:31 | |
host : gpu-st-p4d-24xlarge-52.hpc-1click-prod450.pcluster | |
rank : 52 (local_rank: 4) | |
exitcode : -6 (pid: 39401) | |
error_file: <N/A> | |
traceback : Signal 6 (SIGABRT) received by PID 39401 | |
============================================================ | |
srun: error: gpu-st-p4d-24xlarge-50: task 6: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-58: task 14: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-47: task 3: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-59: task 15: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-56: task 12: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-53: task 9: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-54: task 10: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-44: task 0: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-49: task 5: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-52: task 8: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-45: task 1: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-48: task 4: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-57: task 13: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-46: task 2: Exited with exit code 1 | |
slurmstepd: error: Detected 1142 oom-kill event(s) in StepId=4236.0. Some of your processes may have been killed by the cgroup out-of-memory handler. | |
srun: error: gpu-st-p4d-24xlarge-51: task 7: Exited with exit code 1 | |
srun: error: gpu-st-p4d-24xlarge-55: task 11: Out Of Memory |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment