Created
December 15, 2024 19:19
-
-
Save LunNova/0809398bd1abce6dbe2402bf0a89d881 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I1215 08:32:37.869000 4070434 torch/_inductor/config.py:635] compile_threads set to 12 via env | |
using device: cuda:2 | |
using device: cuda:1 | |
using device: cuda:3 | |
using device: cuda:5 | |
using device: cuda:4 | |
using device: cuda:0 | |
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO Bootstrap : Using eno1np0:10.5.5.236<0> | |
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO NET/Plugin: No plugin found (librccl-net.so) | |
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : librccl-net.so: cannot open shared object file: No such file or directory : when loading librccl-net.so | |
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO NET/Plugin: Using internal network plugin. | |
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO Kernel version: 6.12.0 | |
tsukiakari-nixos:4070435:4070435 [1] NCCL INFO ROCr version 1.1 | |
tsukiakari-nixos:4070435:4070435 [1] NCCL INFO NCCL_DMABUF_ENABLE set by environment to 1. | |
tsukiakari-nixos:4070436:4070436 [2] NCCL INFO ROCr version 1.1 | |
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO ROCr version 1.1 | |
tsukiakari-nixos:4070435:4070435 [1] NCCL INFO Could not open kernel conf file, will assume CONFIG_DMABUF_MOVE_NOTIFY and CONFIG_PCI_P2PDMA are enabled | |
tsukiakari-nixos:4070436:4070436 [2] NCCL INFO NCCL_DMABUF_ENABLE set by environment to 1. | |
tsukiakari-nixos:4070435:4070435 [1] NCCL INFO DMA_BUF Support Enabled | |
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO NCCL_DMABUF_ENABLE set by environment to 1. | |
tsukiakari-nixos:4070436:4070436 [2] NCCL INFO Could not open kernel conf file, will assume CONFIG_DMABUF_MOVE_NOTIFY and CONFIG_PCI_P2PDMA are enabled | |
tsukiakari-nixos:4070437:4070437 [3] NCCL INFO ROCr version 1.1 | |
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO Could not open kernel conf file, will assume CONFIG_DMABUF_MOVE_NOTIFY and CONFIG_PCI_P2PDMA are enabled | |
tsukiakari-nixos:4070436:4070436 [2] NCCL INFO DMA_BUF Support Enabled | |
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO DMA_BUF Support Enabled | |
tsukiakari-nixos:4070439:4070439 [5] NCCL INFO ROCr version 1.1 | |
tsukiakari-nixos:4070437:4070437 [3] NCCL INFO NCCL_DMABUF_ENABLE set by environment to 1. | |
tsukiakari-nixos:4070439:4070439 [5] NCCL INFO NCCL_DMABUF_ENABLE set by environment to 1. | |
RCCL version : 2.21.5-Unknown | |
HIP version : 6.3.42131- | |
ROCm version : 6.2.2.0-9999-unknown | |
Hostname : tsukiakari-nixos | |
Librccl path : /nix/store/hc2saq7x6k17z58nx66aybbwvh4bbzlq-rccl-6.3.0/lib/librccl.so.1 | |
tsukiakari-nixos:4070437:4070437 [3] NCCL INFO Could not open kernel conf file, will assume CONFIG_DMABUF_MOVE_NOTIFY and CONFIG_PCI_P2PDMA are enabled | |
tsukiakari-nixos:4070439:4070439 [5] NCCL INFO Could not open kernel conf file, will assume CONFIG_DMABUF_MOVE_NOTIFY and CONFIG_PCI_P2PDMA are enabled | |
tsukiakari-nixos:4070437:4070437 [3] NCCL INFO DMA_BUF Support Enabled | |
tsukiakari-nixos:4070439:4070439 [5] NCCL INFO DMA_BUF Support Enabled | |
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO Comm config Blocking set to 0 | |
tsukiakari-nixos:4070435:4070435 [1] NCCL INFO Bootstrap : Using eno1np0:10.5.5.236<0> | |
tsukiakari-nixos:4070439:4070439 [5] NCCL INFO Bootstrap : Using eno1np0:10.5.5.236<0> | |
tsukiakari-nixos:4070437:4070437 [3] NCCL INFO Bootstrap : Using eno1np0:10.5.5.236<0> | |
tsukiakari-nixos:4070435:4070435 [1] NCCL INFO NET/Plugin: No plugin found (librccl-net.so) | |
tsukiakari-nixos:4070436:4070436 [2] NCCL INFO Bootstrap : Using eno1np0:10.5.5.236<0> | |
tsukiakari-nixos:4070435:4070435 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : librccl-net.so: cannot open shared object file: No such file or directory : when loading librccl-net.so | |
tsukiakari-nixos:4070439:4070439 [5] NCCL INFO NET/Plugin: No plugin found (librccl-net.so) | |
tsukiakari-nixos:4070435:4070435 [1] NCCL INFO NET/Plugin: Using internal network plugin. | |
tsukiakari-nixos:4070437:4070437 [3] NCCL INFO NET/Plugin: No plugin found (librccl-net.so) | |
tsukiakari-nixos:4070439:4070439 [5] NCCL INFO NET/Plugin: Plugin load returned 2 : librccl-net.so: cannot open shared object file: No such file or directory : when loading librccl-net.so | |
tsukiakari-nixos:4070439:4070439 [5] NCCL INFO NET/Plugin: Using internal network plugin. | |
tsukiakari-nixos:4070437:4070437 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : librccl-net.so: cannot open shared object file: No such file or directory : when loading librccl-net.so | |
tsukiakari-nixos:4070437:4070437 [3] NCCL INFO NET/Plugin: Using internal network plugin. | |
tsukiakari-nixos:4070435:4070435 [1] NCCL INFO Kernel version: 6.12.0 | |
tsukiakari-nixos:4070436:4070436 [2] NCCL INFO NET/Plugin: No plugin found (librccl-net.so) | |
tsukiakari-nixos:4070436:4070436 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : librccl-net.so: cannot open shared object file: No such file or directory : when loading librccl-net.so | |
tsukiakari-nixos:4070439:4070439 [5] NCCL INFO Kernel version: 6.12.0 | |
tsukiakari-nixos:4070436:4070436 [2] NCCL INFO NET/Plugin: Using internal network plugin. | |
tsukiakari-nixos:4070437:4070437 [3] NCCL INFO Kernel version: 6.12.0 | |
tsukiakari-nixos:4070438:4070438 [4] NCCL INFO ROCr version 1.1 | |
tsukiakari-nixos:4070436:4070436 [2] NCCL INFO Kernel version: 6.12.0 | |
tsukiakari-nixos:4070438:4070438 [4] NCCL INFO NCCL_DMABUF_ENABLE set by environment to 1. | |
tsukiakari-nixos:4070435:4070435 [1] NCCL INFO Comm config Blocking set to 0 | |
tsukiakari-nixos:4070439:4070439 [5] NCCL INFO Comm config Blocking set to 0 | |
tsukiakari-nixos:4070437:4070437 [3] NCCL INFO Comm config Blocking set to 0 | |
tsukiakari-nixos:4070438:4070438 [4] NCCL INFO Could not open kernel conf file, will assume CONFIG_DMABUF_MOVE_NOTIFY and CONFIG_PCI_P2PDMA are enabled | |
tsukiakari-nixos:4070436:4070436 [2] NCCL INFO Comm config Blocking set to 0 | |
tsukiakari-nixos:4070438:4070438 [4] NCCL INFO DMA_BUF Support Enabled | |
tsukiakari-nixos:4070438:4070438 [4] NCCL INFO Bootstrap : Using eno1np0:10.5.5.236<0> | |
tsukiakari-nixos:4070438:4070438 [4] NCCL INFO NET/Plugin: No plugin found (librccl-net.so) | |
tsukiakari-nixos:4070438:4070438 [4] NCCL INFO NET/Plugin: Plugin load returned 2 : librccl-net.so: cannot open shared object file: No such file or directory : when loading librccl-net.so | |
tsukiakari-nixos:4070438:4070438 [4] NCCL INFO NET/Plugin: Using internal network plugin. | |
tsukiakari-nixos:4070438:4070438 [4] NCCL INFO Kernel version: 6.12.0 | |
tsukiakari-nixos:4070438:4070438 [4] NCCL INFO Comm config Blocking set to 0 | |
tsukiakari-nixos:4070437:4070474 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [RO]; OOB eno1np0:10.5.5.236<0> | |
/build/source/build/hipify/src/transport/net_ib.cc:592:12: runtime error: index 3 out of bounds for type 'const char *[3]' | |
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
================================================================= | |
==4070437==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ff82df5b038 at pc 0x7fff7c15a3b5 bp 0x7ff8312e28f0 sp 0x7ff8312e28e8 | |
READ of size 8 at 0x7ff82df5b038 thread T6 | |
tsukiakari-nixos:4070434:4070466 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [RO]; OOB eno1np0:10.5.5.236<0> | |
/build/source/build/hipify/src/transport/net_ib.cc:592:12: runtime error: index 3 out of bounds for type 'const char *[3]' | |
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
================================================================= | |
==4070434==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ff82cc51038 at pc 0x7fff7c15a3b5 bp 0x7ff82db508f0 sp 0x7ff82db508e8 | |
READ of size 8 at 0x7ff82cc51038 thread T7 | |
tsukiakari-nixos:4070435:4070471 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [RO]; OOB eno1np0:10.5.5.236<0> | |
/build/source/build/hipify/src/transport/net_ib.cc:592:12: runtime error: index 3 out of bounds for type 'const char *[3]' | |
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
================================================================= | |
tsukiakari-nixos:4070436:4070473 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [RO]; OOB eno1np0:10.5.5.236<0> | |
tsukiakari-nixos:4070439:4070472 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [RO]; OOB eno1np0:10.5.5.236<0> | |
/build/source/build/hipify/src/transport/net_ib.cc:592:12: runtime error: index 3 out of bounds for type 'const char *[3]' | |
/build/source/build/hipify/src/transport/net_ib.cc:592:12: runtime error: index 3 out of bounds for type 'const char *[3]' | |
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
================================================================= | |
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
================================================================= | |
==4070435==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ff82df5b038 at pc 0x7fff7c15a3b5 bp 0x7ff8312e28f0 sp 0x7ff8312e28e8 | |
READ of size 8 at 0x7ff82df5b038 thread T6 | |
==4070436==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ff82df5b038 at pc 0x7fff7c15a3b5 bp 0x7ff8312e28f0 sp 0x7ff8312e28e8 | |
READ of size 8 at 0x7ff82df5b038 thread T6 | |
==4070439==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ff82d8a8038 at pc 0x7fff7c15a3b5 bp 0x7ff82e7a78f0 sp 0x7ff82e7a78e8 | |
READ of size 8 at 0x7ff82d8a8038 thread T6 | |
tsukiakari-nixos:4070438:4070476 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [RO]; OOB eno1np0:10.5.5.236<0> | |
/build/source/build/hipify/src/transport/net_ib.cc:592:12: runtime error: index 3 out of bounds for type 'const char *[3]' | |
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
================================================================= | |
==4070438==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ff82df5b038 at pc 0x7fff7c15a3b5 bp 0x7ff8312e28f0 sp 0x7ff8312e28e8 | |
READ of size 8 at 0x7ff82df5b038 thread T6 | |
#0 0x7fff7c15a3b4 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
#1 0x7fff7c15adfe in ncclIbGetProperties(int, ncclNetProperties_v8_t*) /build/source/build/hipify/src/transport/net_ib.cc:688:7 | |
#2 0x7fff7bf38198 in ncclNetCheckDeviceVersion(ncclComm*, ncclNet_v8_t*, int) /build/source/build/hipify/src/net.cc:501:3 | |
#3 0x7fff7bf388d5 in ncclNetInit(ncclComm*) /build/source/build/hipify/src/net.cc:562:24 | |
#4 0x7fff7bf0b0ad in commAlloc(ncclComm*, ncclComm*, int, int) /build/source/build/hipify/src/init.cc:533:3 | |
#5 0x7fff7bf00395 in ncclCommInitRankFunc(ncclAsyncJob*) /build/source/build/hipify/src/init.cc:2002:5 | |
#6 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#7 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
#8 0x7ffff69b0d01 in start_thread (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x90d01) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
#9 0x7ffff6a303ab in __GI___clone3 (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x1103ab) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
Address 0x7ff82df5b038 is located in stack of thread T6 at offset 56 in frame | |
#0 0x7fff7c159ae7 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:568 | |
This frame has 3 object(s): | |
[32, 56) 'memory_peers_paths' (line 587) <== Memory access at offset 56 overflows this variable | |
[96, 351) 'strValue' (line 603) | |
[416, 672) 'buf' (line 613) | |
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork | |
(longjmp and C++ exceptions *are* supported) | |
Thread T6 created by T5 here: | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7beead9d in groupLaunch(ncclAsyncJob*) /build/source/build/hipify/src/group.cc:314:7 | |
#2 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#3 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
Thread T5 created by T0 here: | |
#0 0x7fff7c15a3b4 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
#1 0x7fff7c15adfe in ncclIbGetProperties(int, ncclNetProperties_v8_t*) /build/source/build/hipify/src/transport/net_ib.cc:688:7 | |
#2 0x7fff7bf38198 in ncclNetCheckDeviceVersion(ncclComm*, ncclNet_v8_t*, int) /build/source/build/hipify/src/net.cc:501:3 | |
#3 0x7fff7bf388d5 in ncclNetInit(ncclComm*) /build/source/build/hipify/src/net.cc:562:24 | |
#4 0x7fff7bf0b0ad in commAlloc(ncclComm*, ncclComm*, int, int) /build/source/build/hipify/src/init.cc:533:3 | |
#5 0x7fff7bf00395 in ncclCommInitRankFunc(ncclAsyncJob*) /build/source/build/hipify/src/init.cc:2002:5 | |
#6 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#7 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
#8 0x7ffff69b0d01 in start_thread (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x90d01) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
#9 0x7ffff6a303ab in __GI___clone3 (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x1103ab) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
Address 0x7ff82df5b038 is located in stack of thread T6 at offset 56 in frame | |
#0 0x7fff7c159ae7 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:568 | |
This frame has 3 object(s): | |
[32, 56) 'memory_peers_paths' (line 587) <== Memory access at offset 56 overflows this variable | |
[96, 351) 'strValue' (line 603) | |
[416, 672) 'buf' (line 613) | |
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork | |
(longjmp and C++ exceptions *are* supported) | |
Thread T6 created by T5 here: | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7beead9d in groupLaunch(ncclAsyncJob*) /build/source/build/hipify/src/group.cc:314:7 | |
#2 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#3 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
Thread T5 created by T0 here: | |
#0 0x7fff7c15a3b4 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
#1 0x7fff7c15adfe in ncclIbGetProperties(int, ncclNetProperties_v8_t*) /build/source/build/hipify/src/transport/net_ib.cc:688:7 | |
#2 0x7fff7bf38198 in ncclNetCheckDeviceVersion(ncclComm*, ncclNet_v8_t*, int) /build/source/build/hipify/src/net.cc:501:3 | |
#3 0x7fff7bf388d5 in ncclNetInit(ncclComm*) /build/source/build/hipify/src/net.cc:562:24 | |
#4 0x7fff7bf0b0ad in commAlloc(ncclComm*, ncclComm*, int, int) /build/source/build/hipify/src/init.cc:533:3 | |
#5 0x7fff7bf00395 in ncclCommInitRankFunc(ncclAsyncJob*) /build/source/build/hipify/src/init.cc:2002:5 | |
#6 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#7 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
#8 0x7ffff69b0d01 in start_thread (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x90d01) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
#9 0x7ffff6a303ab in __GI___clone3 (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x1103ab) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
Address 0x7ff82df5b038 is located in stack of thread T6 at offset 56 in frame | |
#0 0x7fff7c159ae7 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:568 | |
This frame has 3 object(s): | |
[32, 56) 'memory_peers_paths' (line 587) <== Memory access at offset 56 overflows this variable | |
[96, 351) 'strValue' (line 603) | |
[416, 672) 'buf' (line 613) | |
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork | |
(longjmp and C++ exceptions *are* supported) | |
Thread T6 created by T5 here: | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7beead9d in groupLaunch(ncclAsyncJob*) /build/source/build/hipify/src/group.cc:314:7 | |
#2 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#3 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
Thread T5 created by T0 here: | |
#0 0x7fff7c15a3b4 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
#1 0x7fff7c15adfe in ncclIbGetProperties(int, ncclNetProperties_v8_t*) /build/source/build/hipify/src/transport/net_ib.cc:688:7 | |
#2 0x7fff7bf38198 in ncclNetCheckDeviceVersion(ncclComm*, ncclNet_v8_t*, int) /build/source/build/hipify/src/net.cc:501:3 | |
#3 0x7fff7bf388d5 in ncclNetInit(ncclComm*) /build/source/build/hipify/src/net.cc:562:24 | |
#4 0x7fff7bf0b0ad in commAlloc(ncclComm*, ncclComm*, int, int) /build/source/build/hipify/src/init.cc:533:3 | |
#5 0x7fff7bf00395 in ncclCommInitRankFunc(ncclAsyncJob*) /build/source/build/hipify/src/init.cc:2002:5 | |
#6 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#7 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
#8 0x7ffff69b0d01 in start_thread (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x90d01) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
#9 0x7ffff6a303ab in __GI___clone3 (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x1103ab) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
Address 0x7ff82d8a8038 is located in stack of thread T6 at offset 56 in frame | |
#0 0x7fff7c159ae7 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:568 | |
This frame has 3 object(s): | |
[32, 56) 'memory_peers_paths' (line 587) <== Memory access at offset 56 overflows this variable | |
[96, 351) 'strValue' (line 603) | |
[416, 672) 'buf' (line 613) | |
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork | |
(longjmp and C++ exceptions *are* supported) | |
Thread T6 created by T5 here: | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7beead9d in groupLaunch(ncclAsyncJob*) /build/source/build/hipify/src/group.cc:314:7 | |
#2 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#3 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
Thread T5 created by T0 here: | |
#0 0x7fff7c15a3b4 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
#1 0x7fff7c15adfe in ncclIbGetProperties(int, ncclNetProperties_v8_t*) /build/source/build/hipify/src/transport/net_ib.cc:688:7 | |
#2 0x7fff7bf38198 in ncclNetCheckDeviceVersion(ncclComm*, ncclNet_v8_t*, int) /build/source/build/hipify/src/net.cc:501:3 | |
#3 0x7fff7bf388d5 in ncclNetInit(ncclComm*) /build/source/build/hipify/src/net.cc:562:24 | |
#4 0x7fff7bf0b0ad in commAlloc(ncclComm*, ncclComm*, int, int) /build/source/build/hipify/src/init.cc:533:3 | |
#5 0x7fff7bf00395 in ncclCommInitRankFunc(ncclAsyncJob*) /build/source/build/hipify/src/init.cc:2002:5 | |
#6 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#7 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
#8 0x7ffff69b0d01 in start_thread (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x90d01) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
#9 0x7ffff6a303ab in __GI___clone3 (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x1103ab) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
Address 0x7ff82df5b038 is located in stack of thread T6 at offset 56 in frame | |
#0 0x7fff7c159ae7 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:568 | |
This frame has 3 object(s): | |
[32, 56) 'memory_peers_paths' (line 587) <== Memory access at offset 56 overflows this variable | |
[96, 351) 'strValue' (line 603) | |
[416, 672) 'buf' (line 613) | |
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork | |
(longjmp and C++ exceptions *are* supported) | |
Thread T6 created by T5 here: | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7beead9d in groupLaunch(ncclAsyncJob*) /build/source/build/hipify/src/group.cc:314:7 | |
#2 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#3 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
Thread T5 created by T0 here: | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7bee9809 in ncclGroupEndInternal() /build/source/build/hipify/src/group.cc:434:7 | |
#2 0x7fff7bef9d1b in ncclCommInitRankConfig_impl(ncclComm**, int, ncclUniqueId, int, ncclConfig_v21700*) /build/source/build/hipify/src/init.cc:2452:3 | |
#3 0x7fff7c07b58c in ncclCommInitRankConfig /build/source/build/hipify/src/misc/api_trace.cc:544:12 | |
#4 0x7fffc2f25a24 in c10d::NCCLComm::create(int, int, ncclUniqueId, signed char, ncclConfig_v21700&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x2325a24) | |
#5 0x7fffc2ef5571 in c10d::ProcessGroupNCCL::initNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, c10::Device&, c10d::OpType, int, bool) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f5571) | |
#6 0x7fffc2ef77f7 in c10d::ProcessGroupNCCL::eagerConnectSingleDevice(c10::Device) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f77f7) | |
#7 0x7fffee0bbd3d in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<void, c10d::Backend, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void (c10d::Backend::*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda'(c10d::Backend*, c10::Device), void, c10d::Backend*, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void&&, c10d::Backend (*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda1'(pybind11::detail::function_call&)::_FUN(pybind11::detail::function_call&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0xebbd3d) | |
#8 0x7fffee54d8ff in typeinfo for c10d::Backend::Options (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0x134d8ff) | |
#0 0x7fff7c15a3b4 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:592:12 | |
#1 0x7fff7c15adfe in ncclIbGetProperties(int, ncclNetProperties_v8_t*) /build/source/build/hipify/src/transport/net_ib.cc:688:7 | |
#2 0x7fff7bf38198 in ncclNetCheckDeviceVersion(ncclComm*, ncclNet_v8_t*, int) /build/source/build/hipify/src/net.cc:501:3 | |
#3 0x7fff7bf388d5 in ncclNetInit(ncclComm*) /build/source/build/hipify/src/net.cc:562:24 | |
#4 0x7fff7bf0b0ad in commAlloc(ncclComm*, ncclComm*, int, int) /build/source/build/hipify/src/init.cc:533:3 | |
#5 0x7fff7bf00395 in ncclCommInitRankFunc(ncclAsyncJob*) /build/source/build/hipify/src/init.cc:2002:5 | |
#6 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#7 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
#8 0x7ffff69b0d01 in start_thread (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x90d01) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
#9 0x7ffff6a303ab in __GI___clone3 (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x1103ab) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e) | |
Address 0x7ff82cc51038 is located in stack of thread T7 at offset 56 in frame | |
#0 0x7fff7c159ae7 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:568 | |
This frame has 3 object(s): | |
[32, 56) 'memory_peers_paths' (line 587) <== Memory access at offset 56 overflows this variable | |
[96, 351) 'strValue' (line 603) | |
SUMMARY: AddressSanitizer: stack-buffer-overflow /build/source/build/hipify/src/transport/net_ib.cc:592:12 in ncclIbGdrSupport() | |
[416, 672) 'buf' (line 613) | |
Shadow bytes around the buggy address: | |
0x7ff82df5ad80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5ae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5ae80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
=>0x7ff82df5b000: f1 f1 f1 f1 00 00 00[f2]f2 f2 f2 f2 f8 f8 f8 f8 | |
0x7ff82df5b080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 f2 f2 | |
0x7ff82df5b180: f2 f2 f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b280: f8 f8 f8 f8 f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 | |
Shadow byte legend (one shadow byte represents 8 application bytes): | |
Addressable: 00 | |
Partially addressable: 01 02 03 04 05 06 07 | |
Heap left redzone: fa | |
Freed heap region: fd | |
Stack left redzone: f1 | |
Stack mid redzone: f2 | |
Stack right redzone: f3 | |
Stack after return: f5 | |
Stack use after scope: f8 | |
Global redzone: f9 | |
Global init order: f6 | |
Poisoned by user: f7 | |
Container overflow: fc | |
Array cookie: ac | |
Intra object redzone: bb | |
ASan internal: fe | |
Left alloca redzone: ca | |
Right alloca redzone: cb | |
==4070437==ABORTING | |
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork | |
(longjmp and C++ exceptions *are* supported) | |
Thread T7 created by T6 here: | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7beead9d in groupLaunch(ncclAsyncJob*) /build/source/build/hipify/src/group.cc:314:7 | |
#2 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17 | |
#3 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4) | |
Thread T6 created by T0 here: | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7bee9809 in ncclGroupEndInternal() /build/source/build/hipify/src/group.cc:434:7 | |
#2 0x7fff7bef9d1b in ncclCommInitRankConfig_impl(ncclComm**, int, ncclUniqueId, int, ncclConfig_v21700*) /build/source/build/hipify/src/init.cc:2452:3 | |
#3 0x7fff7c07b58c in ncclCommInitRankConfig /build/source/build/hipify/src/misc/api_trace.cc:544:12 | |
#4 0x7fffc2f25a24 in c10d::NCCLComm::create(int, int, ncclUniqueId, signed char, ncclConfig_v21700&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x2325a24) | |
#5 0x7fffc2ef5571 in c10d::ProcessGroupNCCL::initNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, c10::Device&, c10d::OpType, int, bool) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f5571) | |
#6 0x7fffc2ef77f7 in c10d::ProcessGroupNCCL::eagerConnectSingleDevice(c10::Device) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f77f7) | |
#7 0x7fffee0bbd3d in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<void, c10d::Backend, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void (c10d::Backend::*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda'(c10d::Backend*, c10::Device), void, c10d::Backend*, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void&&, c10d::Backend (*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda1'(pybind11::detail::function_call&)::_FUN(pybind11::detail::function_call&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0xebbd3d) | |
#8 0x7fffee54d8ff in typeinfo for c10d::Backend::Options (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0x134d8ff) | |
SUMMARY: AddressSanitizer: stack-buffer-overflow /build/source/build/hipify/src/transport/net_ib.cc:592:12 in ncclIbGdrSupport() | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7bee9809 in ncclGroupEndInternal() /build/source/build/hipify/src/group.cc:434:7 | |
#2 0x7fff7bef9d1b in ncclCommInitRankConfig_impl(ncclComm**, int, ncclUniqueId, int, ncclConfig_v21700*) /build/source/build/hipify/src/init.cc:2452:3 | |
#3 0x7fff7c07b58c in ncclCommInitRankConfig /build/source/build/hipify/src/misc/api_trace.cc:544:12 | |
#4 0x7fffc2f25a24 in c10d::NCCLComm::create(int, int, ncclUniqueId, signed char, ncclConfig_v21700&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x2325a24) | |
#5 0x7fffc2ef5571 in c10d::ProcessGroupNCCL::initNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, c10::Device&, c10d::OpType, int, bool) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f5571) | |
#6 0x7fffc2ef77f7 in c10d::ProcessGroupNCCL::eagerConnectSingleDevice(c10::Device) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f77f7) | |
#7 0x7fffee0bbd3d in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<void, c10d::Backend, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void (c10d::Backend::*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda'(c10d::Backend*, c10::Device), void, c10d::Backend*, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void&&, c10d::Backend (*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda1'(pybind11::detail::function_call&)::_FUN(pybind11::detail::function_call&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0xebbd3d) | |
#8 0x7fffee54d8ff in typeinfo for c10d::Backend::Options (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0x134d8ff) | |
SUMMARY: AddressSanitizer: stack-buffer-overflow /build/source/build/hipify/src/transport/net_ib.cc:592:12 in ncclIbGdrSupport() | |
Shadow bytes around the buggy address: | |
0x7ff82d8a7d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82d8a7e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82d8a7e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82d8a7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82d8a7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
=>0x7ff82d8a8000: f1 f1 f1 f1 00 00 00[f2]f2 f2 f2 f2 f8 f8 f8 f8 | |
0x7ff82d8a8080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82d8a8100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 f2 f2 | |
0x7ff82d8a8180: f2 f2 f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82d8a8200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82d8a8280: f8 f8 f8 f8 f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 | |
Shadow byte legend (one shadow byte represents 8 application bytes): | |
Addressable: 00 | |
Partially addressable: 01 02 03 04 05 06 07 | |
Heap left redzone: fa | |
Freed heap region: fd | |
Stack left redzone: f1 | |
Stack mid redzone: f2 | |
Stack right redzone: f3 | |
Stack after return: f5 | |
Stack use after scope: f8 | |
Global redzone: f9 | |
Global init order: f6 | |
Poisoned by user: f7 | |
Container overflow: fc | |
Array cookie: ac | |
Intra object redzone: bb | |
ASan internal: fe | |
Left alloca redzone: ca | |
Right alloca redzone: cb | |
==4070439==ABORTING | |
Shadow bytes around the buggy address: | |
0x7ff82df5ad80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5ae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5ae80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
=>0x7ff82df5b000: f1 f1 f1 f1 00 00 00[f2]f2 f2 f2 f2 f8 f8 f8 f8 | |
0x7ff82df5b080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 f2 f2 | |
0x7ff82df5b180: f2 f2 f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b280: f8 f8 f8 f8 f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 | |
Shadow byte legend (one shadow byte represents 8 application bytes): | |
Addressable: 00 | |
Partially addressable: 01 02 03 04 05 06 07 | |
Heap left redzone: fa | |
Freed heap region: fd | |
Stack left redzone: f1 | |
Stack mid redzone: f2 | |
Stack right redzone: f3 | |
Stack after return: f5 | |
Stack use after scope: f8 | |
Global redzone: f9 | |
Global init order: f6 | |
Poisoned by user: f7 | |
Container overflow: fc | |
Array cookie: ac | |
Intra object redzone: bb | |
ASan internal: fe | |
Left alloca redzone: ca | |
Right alloca redzone: cb | |
==4070435==ABORTING | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7bee9809 in ncclGroupEndInternal() /build/source/build/hipify/src/group.cc:434:7 | |
#2 0x7fff7bef9d1b in ncclCommInitRankConfig_impl(ncclComm**, int, ncclUniqueId, int, ncclConfig_v21700*) /build/source/build/hipify/src/init.cc:2452:3 | |
#3 0x7fff7c07b58c in ncclCommInitRankConfig /build/source/build/hipify/src/misc/api_trace.cc:544:12 | |
#4 0x7fffc2f25a24 in c10d::NCCLComm::create(int, int, ncclUniqueId, signed char, ncclConfig_v21700&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x2325a24) | |
#5 0x7fffc2ef5571 in c10d::ProcessGroupNCCL::initNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, c10::Device&, c10d::OpType, int, bool) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f5571) | |
#6 0x7fffc2ef77f7 in c10d::ProcessGroupNCCL::eagerConnectSingleDevice(c10::Device) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f77f7) | |
#7 0x7fffee0bbd3d in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<void, c10d::Backend, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void (c10d::Backend::*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda'(c10d::Backend*, c10::Device), void, c10d::Backend*, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void&&, c10d::Backend (*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda1'(pybind11::detail::function_call&)::_FUN(pybind11::detail::function_call&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0xebbd3d) | |
#8 0x7fffee54d8ff in typeinfo for c10d::Backend::Options (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0x134d8ff) | |
SUMMARY: AddressSanitizer: stack-buffer-overflow /build/source/build/hipify/src/transport/net_ib.cc:592:12 in ncclIbGdrSupport() | |
Shadow bytes around the buggy address: | |
0x7ff82df5ad80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5ae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5ae80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
=>0x7ff82df5b000: f1 f1 f1 f1 00 00 00[f2]f2 f2 f2 f2 f8 f8 f8 f8 | |
0x7ff82df5b080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 f2 f2 | |
0x7ff82df5b180: f2 f2 f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b280: f8 f8 f8 f8 f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 | |
Shadow byte legend (one shadow byte represents 8 application bytes): | |
Addressable: 00 | |
Partially addressable: 01 02 03 04 05 06 07 | |
Heap left redzone: fa | |
Freed heap region: fd | |
Stack left redzone: f1 | |
Stack mid redzone: f2 | |
Stack right redzone: f3 | |
Stack after return: f5 | |
Stack use after scope: f8 | |
Global redzone: f9 | |
Global init order: f6 | |
Poisoned by user: f7 | |
Container overflow: fc | |
Array cookie: ac | |
Intra object redzone: bb | |
ASan internal: fe | |
Left alloca redzone: ca | |
Right alloca redzone: cb | |
==4070436==ABORTING | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7bee9809 in ncclGroupEndInternal() /build/source/build/hipify/src/group.cc:434:7 | |
#2 0x7fff7bef9d1b in ncclCommInitRankConfig_impl(ncclComm**, int, ncclUniqueId, int, ncclConfig_v21700*) /build/source/build/hipify/src/init.cc:2452:3 | |
#3 0x7fff7c07b58c in ncclCommInitRankConfig /build/source/build/hipify/src/misc/api_trace.cc:544:12 | |
#4 0x7fffc2f25a24 in c10d::NCCLComm::create(int, int, ncclUniqueId, signed char, ncclConfig_v21700&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x2325a24) | |
#5 0x7fffc2ef5571 in c10d::ProcessGroupNCCL::initNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, c10::Device&, c10d::OpType, int, bool) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f5571) | |
#6 0x7fffc2ef77f7 in c10d::ProcessGroupNCCL::eagerConnectSingleDevice(c10::Device) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f77f7) | |
#7 0x7fffee0bbd3d in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<void, c10d::Backend, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void (c10d::Backend::*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda'(c10d::Backend*, c10::Device), void, c10d::Backend*, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void&&, c10d::Backend (*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda1'(pybind11::detail::function_call&)::_FUN(pybind11::detail::function_call&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0xebbd3d) | |
#8 0x7fffee54d8ff in typeinfo for c10d::Backend::Options (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0x134d8ff) | |
SUMMARY: AddressSanitizer: stack-buffer-overflow /build/source/build/hipify/src/transport/net_ib.cc:592:12 in ncclIbGdrSupport() | |
Shadow bytes around the buggy address: | |
0x7ff82df5ad80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5ae00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5ae80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82df5af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
=>0x7ff82df5b000: f1 f1 f1 f1 00 00 00[f2]f2 f2 f2 f2 f8 f8 f8 f8 | |
0x7ff82df5b080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 f2 f2 | |
0x7ff82df5b180: f2 f2 f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82df5b280: f8 f8 f8 f8 f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 | |
Shadow byte legend (one shadow byte represents 8 application bytes): | |
Addressable: 00 | |
Partially addressable: 01 02 03 04 05 06 07 | |
Heap left redzone: fa | |
Freed heap region: fd | |
Stack left redzone: f1 | |
Stack mid redzone: f2 | |
Stack right redzone: f3 | |
Stack after return: f5 | |
Stack use after scope: f8 | |
Global redzone: f9 | |
Global init order: f6 | |
Poisoned by user: f7 | |
Container overflow: fc | |
Array cookie: ac | |
Intra object redzone: bb | |
ASan internal: fe | |
Left alloca redzone: ca | |
Right alloca redzone: cb | |
==4070438==ABORTING | |
#0 0x7ffff755c38d in pthread_create (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x15c38d) | |
#1 0x7fff7bee9809 in ncclGroupEndInternal() /build/source/build/hipify/src/group.cc:434:7 | |
#2 0x7fff7bef9d1b in ncclCommInitRankConfig_impl(ncclComm**, int, ncclUniqueId, int, ncclConfig_v21700*) /build/source/build/hipify/src/init.cc:2452:3 | |
#3 0x7fff7c07b58c in ncclCommInitRankConfig /build/source/build/hipify/src/misc/api_trace.cc:544:12 | |
#4 0x7fffc2f25a24 in c10d::NCCLComm::create(int, int, ncclUniqueId, signed char, ncclConfig_v21700&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x2325a24) | |
#5 0x7fffc2ef5571 in c10d::ProcessGroupNCCL::initNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, c10::Device&, c10d::OpType, int, bool) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f5571) | |
#6 0x7fffc2ef77f7 in c10d::ProcessGroupNCCL::eagerConnectSingleDevice(c10::Device) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_hip.so+0x22f77f7) | |
#7 0x7fffee0bbd3d in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<void, c10d::Backend, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void (c10d::Backend::*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda'(c10d::Backend*, c10::Device), void, c10d::Backend*, c10::Device, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release>>(void&&, c10d::Backend (*)(c10::Device), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::'lambda1'(pybind11::detail::function_call&)::_FUN(pybind11::detail::function_call&) (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0xebbd3d) | |
#8 0x7fffee54d8ff in typeinfo for c10d::Backend::Options (/nix/store/xflh908bzilmxqbh24r3hy1djgj6hmjb-python3.12-torch-2.6.0a-nightly-20241203/lib/python3.12/site-packages/torch/lib/libtorch_python.so+0x134d8ff) | |
SUMMARY: AddressSanitizer: stack-buffer-overflow /build/source/build/hipify/src/transport/net_ib.cc:592:12 in ncclIbGdrSupport() | |
Shadow bytes around the buggy address: | |
0x7ff82cc50d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82cc50e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82cc50e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82cc50f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
0x7ff82cc50f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | |
=>0x7ff82cc51000: f1 f1 f1 f1 00 00 00[f2]f2 f2 f2 f2 f8 f8 f8 f8 | |
0x7ff82cc51080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82cc51100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 f2 f2 | |
0x7ff82cc51180: f2 f2 f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82cc51200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 | |
0x7ff82cc51280: f8 f8 f8 f8 f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 | |
Shadow byte legend (one shadow byte represents 8 application bytes): | |
Addressable: 00 | |
Partially addressable: 01 02 03 04 05 06 07 | |
Heap left redzone: fa | |
Freed heap region: fd | |
Stack left redzone: f1 | |
Stack mid redzone: f2 | |
Stack right redzone: f3 | |
Stack after return: f5 | |
Stack use after scope: f8 | |
Global redzone: f9 | |
Global init order: f6 | |
Poisoned by user: f7 | |
Container overflow: fc | |
Array cookie: ac | |
Intra object redzone: bb | |
ASan internal: fe | |
Left alloca redzone: ca | |
Right alloca redzone: cb | |
==4070434==ABORTING |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment