Created
August 8, 2025 12:47
-
-
Save Flamefire/5e97d3313be0e95066c94ec341493b42 to your computer and use it in GitHub Desktop.
(partial) EasyBuild log for failed build of /dev/shm/easybuild-tmp/eb-a35wfmyj/files_pr23605/c/CUTLASS/CUTLASS-4.1.0-foss-2024a-CUDA-12.6.0.eb (PR(s) #23605)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_fill.h:1845:96: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
1845 | BlockFillRandomUniform<Element>( | |
| ^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomGaussianFunc<Element>::operator()() const [mit Element = cutlass::integer_subbyte<2, true>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:571:55: erfordert durch »void cutlass::reference::host::BlockFillRandomGaussian(Element*, size_t, uint64_t, double, double, int, double) [mit Element = cutlass::integer_subbyte<2, true>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1491:35: erfordert durch »void cutlass::reference::host::BlockFillRandom(Element*, size_t, uint64_t, cutlass::Distribution) [mit Element = cutlass::integer_subbyte<2, true>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/profiler/src/device_allocation.cu:937:72: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:203:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
203 | result = static_cast<Element>(rnd); | |
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:206:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
206 | result = static_cast<Element>(rnd); | |
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:220:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
220 | result = Element(rnd); | |
| ~^~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomUniformFunc<Element>::operator()() [mit Element = cutlass::integer_subbyte<2, true>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1123:55: erfordert durch »void cutlass::reference::host::BlockFillRandomUniform(Element*, size_t, uint64_t, double, double, int, double) [mit Element = cutlass::integer_subbyte<2, true>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1501:34: erfordert durch »void cutlass::reference::host::BlockFillRandom(Element*, size_t, uint64_t, cutlass::Distribution) [mit Element = cutlass::integer_subbyte<2, true>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/profiler/src/device_allocation.cu:937:72: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:642:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
642 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:645:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
645 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:654:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
654 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomGaussianFunc<Element>::operator()() const [mit Element = cutlass::integer_subbyte<4, true>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:571:55: erfordert durch »void cutlass::reference::host::BlockFillRandomGaussian(Element*, size_t, uint64_t, double, double, int, double) [mit Element = cutlass::integer_subbyte<4, true>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1491:35: erfordert durch »void cutlass::reference::host::BlockFillRandom(Element*, size_t, uint64_t, cutlass::Distribution) [mit Element = cutlass::integer_subbyte<4, true>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/profiler/src/device_allocation.cu:945:72: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:203:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
203 | result = static_cast<Element>(rnd); | |
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:206:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
206 | result = static_cast<Element>(rnd); | |
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:220:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
220 | result = Element(rnd); | |
| ~^~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomUniformFunc<Element>::operator()() [mit Element = cutlass::integer_subbyte<4, true>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1123:55: erfordert durch »void cutlass::reference::host::BlockFillRandomUniform(Element*, size_t, uint64_t, double, double, int, double) [mit Element = cutlass::integer_subbyte<4, true>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1501:34: erfordert durch »void cutlass::reference::host::BlockFillRandom(Element*, size_t, uint64_t, cutlass::Distribution) [mit Element = cutlass::integer_subbyte<4, true>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/profiler/src/device_allocation.cu:945:72: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:642:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
642 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:645:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
645 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:654:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
654 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomGaussianFunc<Element>::operator()() const [mit Element = cutlass::integer_subbyte<1, false>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:571:55: erfordert durch »void cutlass::reference::host::BlockFillRandomGaussian(Element*, size_t, uint64_t, double, double, int, double) [mit Element = cutlass::integer_subbyte<1, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1491:35: erfordert durch »void cutlass::reference::host::BlockFillRandom(Element*, size_t, uint64_t, cutlass::Distribution) [mit Element = cutlass::integer_subbyte<1, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/profiler/src/device_allocation.cu:985:73: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:203:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 1; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
203 | result = static_cast<Element>(rnd); | |
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:206:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 1; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
206 | result = static_cast<Element>(rnd); | |
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:220:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 1; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
220 | result = Element(rnd); | |
| ~^~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomUniformFunc<Element>::operator()() [mit Element = cutlass::integer_subbyte<1, false>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1123:55: erfordert durch »void cutlass::reference::host::BlockFillRandomUniform(Element*, size_t, uint64_t, double, double, int, double) [mit Element = cutlass::integer_subbyte<1, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1501:34: erfordert durch »void cutlass::reference::host::BlockFillRandom(Element*, size_t, uint64_t, cutlass::Distribution) [mit Element = cutlass::integer_subbyte<1, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/profiler/src/device_allocation.cu:985:73: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:642:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 1; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
642 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:645:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 1; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
645 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:654:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 1; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
654 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomGaussianFunc<Element>::operator()() const [mit Element = cutlass::integer_subbyte<2, false>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:571:55: erfordert durch »void cutlass::reference::host::BlockFillRandomGaussian(Element*, size_t, uint64_t, double, double, int, double) [mit Element = cutlass::integer_subbyte<2, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1491:35: erfordert durch »void cutlass::reference::host::BlockFillRandom(Element*, size_t, uint64_t, cutlass::Distribution) [mit Element = cutlass::integer_subbyte<2, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/profiler/src/device_allocation.cu:993:73: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:203:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
203 | result = static_cast<Element>(rnd); | |
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:206:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
206 | result = static_cast<Element>(rnd); | |
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:220:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
220 | result = Element(rnd); | |
| ~^~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomUniformFunc<Element>::operator()() [mit Element = cutlass::integer_subbyte<2, false>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1123:55: erfordert durch »void cutlass::reference::host::BlockFillRandomUniform(Element*, size_t, uint64_t, double, double, int, double) [mit Element = cutlass::integer_subbyte<2, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1501:34: erfordert durch »void cutlass::reference::host::BlockFillRandom(Element*, size_t, uint64_t, cutlass::Distribution) [mit Element = cutlass::integer_subbyte<2, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/profiler/src/device_allocation.cu:993:73: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:642:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
642 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:645:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
645 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:654:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 2; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
654 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomGaussianFunc<Element>::operator()() const [mit Element = cutlass::integer_subbyte<4, false>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:571:55: erfordert durch »void cutlass::reference::host::BlockFillRandomGaussian(Element*, size_t, uint64_t, double, double, int, double) [mit Element = cutlass::integer_subbyte<4, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1491:35: erfordert durch »void cutlass::reference::host::BlockFillRandom(Element*, size_t, uint64_t, cutlass::Distribution) [mit Element = cutlass::integer_subbyte<4, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/profiler/src/device_allocation.cu:1001:73: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:203:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
203 | result = static_cast<Element>(rnd); | |
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:206:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
206 | result = static_cast<Element>(rnd); | |
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:220:11: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
220 | result = Element(rnd); | |
| ~^~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomUniformFunc<Element>::operator()() [mit Element = cutlass::integer_subbyte<4, false>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1123:55: erfordert durch »void cutlass::reference::host::BlockFillRandomUniform(Element*, size_t, uint64_t, double, double, int, double) [mit Element = cutlass::integer_subbyte<4, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:1501:34: erfordert durch »void cutlass::reference::host::BlockFillRandom(Element*, size_t, uint64_t, cutlass::Distribution) [mit Element = cutlass::integer_subbyte<4, false>; size_t = long unsigned int; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/profiler/src/device_allocation.cu:1001:73: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:642:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
642 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:645:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
645 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:654:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = false]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
654 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
[2775/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm75_rf.dir/fused_two_convs_s8_sm75_rf.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm75_rf.dir/fused_two_convs_s8_sm75_rf.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_convs_s8_sm75_rf.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm75_rf.dir/fused_two_convs_s8_sm75_rf.cu.o | |
[2776/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm75_shmem.dir/fused_two_convs_s8_sm75_shmem.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm75_shmem.dir/fused_two_convs_s8_sm75_shmem.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_convs_s8_sm75_shmem.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm75_shmem.dir/fused_two_convs_s8_sm75_shmem.cu.o | |
[2777/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm80_rf.dir/fused_two_convs_s8_sm80_rf.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm80_rf.dir/fused_two_convs_s8_sm80_rf.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_convs_s8_sm80_rf.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm80_rf.dir/fused_two_convs_s8_sm80_rf.cu.o | |
[2778/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm80_shmem.dir/fused_two_convs_s8_sm80_shmem.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm80_shmem.dir/fused_two_convs_s8_sm80_shmem.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_convs_s8_sm80_shmem.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_convs_s8_sm80_shmem.dir/fused_two_convs_s8_sm80_shmem.cu.o | |
[2779/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm75_rf.dir/fused_two_gemms_f16_sm75_rf.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm75_rf.dir/fused_two_gemms_f16_sm75_rf.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_gemms_f16_sm75_rf.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm75_rf.dir/fused_two_gemms_f16_sm75_rf.cu.o | |
[2780/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm75_shmem.dir/fused_two_gemms_f16_sm75_shmem.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm75_shmem.dir/fused_two_gemms_f16_sm75_shmem.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_gemms_f16_sm75_shmem.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm75_shmem.dir/fused_two_gemms_f16_sm75_shmem.cu.o | |
[2781/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_grouped_f16_sm80_rf.dir/fused_two_gemms_grouped_f16_sm80_rf.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_grouped_f16_sm80_rf.dir/fused_two_gemms_grouped_f16_sm80_rf.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_gemms_grouped_f16_sm80_rf.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_grouped_f16_sm80_rf.dir/fused_two_gemms_grouped_f16_sm80_rf.cu.o | |
[2782/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm80_rf.dir/fused_two_gemms_f16_sm80_rf.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm80_rf.dir/fused_two_gemms_f16_sm80_rf.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_gemms_f16_sm80_rf.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm80_rf.dir/fused_two_gemms_f16_sm80_rf.cu.o | |
[2783/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/tools/library/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIC -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f6_f6_f32.cu.o -MF tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f6_f6_f32.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src/reference/gemm_f6_f6_f32.cu -o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f6_f6_f32.cu.o | |
[2784/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm80_shmem.dir/fused_two_gemms_f16_sm80_shmem.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm80_shmem.dir/fused_two_gemms_f16_sm80_shmem.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_gemms_f16_sm80_shmem.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_f16_sm80_shmem.dir/fused_two_gemms_f16_sm80_shmem.cu.o | |
[2785/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm75_rf.dir/fused_two_gemms_s8_sm75_rf.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm75_rf.dir/fused_two_gemms_s8_sm75_rf.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_gemms_s8_sm75_rf.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm75_rf.dir/fused_two_gemms_s8_sm75_rf.cu.o | |
[2786/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm75_shmem.dir/fused_two_gemms_s8_sm75_shmem.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm75_shmem.dir/fused_two_gemms_s8_sm75_shmem.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_gemms_s8_sm75_shmem.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm75_shmem.dir/fused_two_gemms_s8_sm75_shmem.cu.o | |
[2787/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/tools/library/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIC -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e5m2a_e4m3out.cu.o -MF tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e5m2a_e4m3out.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src/reference/gemm_e5m2a_e4m3out.cu -o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e5m2a_e4m3out.cu.o | |
[2788/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/14_ampere_tf32_tensorop_gemm/CMakeFiles/14_ampere_tf32_tensorop_gemm.dir/ampere_tf32_tensorop_gemm.cu.o -MF examples/14_ampere_tf32_tensorop_gemm/CMakeFiles/14_ampere_tf32_tensorop_gemm.dir/ampere_tf32_tensorop_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/14_ampere_tf32_tensorop_gemm/ampere_tf32_tensorop_gemm.cu -o examples/14_ampere_tf32_tensorop_gemm/CMakeFiles/14_ampere_tf32_tensorop_gemm.dir/ampere_tf32_tensorop_gemm.cu.o | |
[2789/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm80_rf.dir/fused_two_gemms_s8_sm80_rf.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm80_rf.dir/fused_two_gemms_s8_sm80_rf.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_gemms_s8_sm80_rf.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm80_rf.dir/fused_two_gemms_s8_sm80_rf.cu.o | |
[2790/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/15_ampere_sparse_tensorop_gemm/CMakeFiles/15_ampere_sparse_tensorop_gemm.dir/ampere_sparse_tensorop_gemm.cu.o -MF examples/15_ampere_sparse_tensorop_gemm/CMakeFiles/15_ampere_sparse_tensorop_gemm.dir/ampere_sparse_tensorop_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/15_ampere_sparse_tensorop_gemm/ampere_sparse_tensorop_gemm.cu -o examples/15_ampere_sparse_tensorop_gemm/CMakeFiles/15_ampere_sparse_tensorop_gemm.dir/ampere_sparse_tensorop_gemm.cu.o | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomUniformFunc<Element>::operator()() [mit Element = cutlass::integer_subbyte<4, true>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:854:26: erfordert durch »void cutlass::reference::host::detail::TensorFillRandomUniformFunc<Element, Layout>::operator()(const cutlass::Coord<Layout::kRank>&) [mit Element = cutlass::integer_subbyte<4, true>; Layout = cutlass::layout::RowMajor]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_foreach.h:80:5: erfordert durch »cutlass::reference::host::detail::TensorForEachHelper<Func, Rank, 0>::TensorForEachHelper(Func&, const cutlass::Coord<Rank>&, cutlass::Coord<Rank>&) [mit Func = cutlass::reference::host::detail::TensorFillRandomUniformFunc<cutlass::integer_subbyte<4, true>, cutlass::layout::RowMajor>; int Rank = 2]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_foreach.h:60:1: erfordert durch »cutlass::reference::host::detail::TensorForEachHelper<Func, Rank, RankRemaining>::TensorForEachHelper(Func&, const cutlass::Coord<Rank>&, cutlass::Coord<Rank>&) [mit Func = cutlass::reference::host::detail::TensorFillRandomUniformFunc<cutlass::integer_subbyte<4, true>, cutlass::layout::RowMajor>; int Rank = 2; int RankRemaining = 1]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_foreach.h:95:9: erfordert durch »void cutlass::reference::host::TensorForEach(cutlass::Coord<Rank>, Func&) [mit Func = detail::TensorFillRandomUniformFunc<cutlass::integer_subbyte<4, true>, cutlass::layout::RowMajor>; int Rank = 2]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:977:14: erfordert durch »void cutlass::reference::host::TensorFillRandomUniform(cutlass::TensorView<Element, Layout>, uint64_t, double, double, int, double, bool) [mit Element = cutlass::integer_subbyte<4, true>; Layout = cutlass::layout::RowMajor; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/15_ampere_sparse_tensorop_gemm/ampere_sparse_tensorop_gemm.cu:165:50: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:642:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
642 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:645:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
645 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:654:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
654 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
[2791/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/tools/library/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIC -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e4m3a_e5m2out.cu.o -MF tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e4m3a_e5m2out.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src/reference/gemm_e4m3a_e5m2out.cu -o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e4m3a_e5m2out.cu.o | |
[2792/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm80_shmem.dir/fused_two_gemms_s8_sm80_shmem.cu.o -MF examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm80_shmem.dir/fused_two_gemms_s8_sm80_shmem.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/13_two_tensor_op_fusion/fused_two_gemms_s8_sm80_shmem.cu -o examples/13_two_tensor_op_fusion/CMakeFiles/13_fused_two_gemms_s8_sm80_shmem.dir/fused_two_gemms_s8_sm80_shmem.cu.o | |
[2793/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/15_ampere_sparse_tensorop_gemm/CMakeFiles/15_ampere_sparse_tensorop_gemm_universal.dir/ampere_sparse_tensorop_gemm_universal.cu.o -MF examples/15_ampere_sparse_tensorop_gemm/CMakeFiles/15_ampere_sparse_tensorop_gemm_universal.dir/ampere_sparse_tensorop_gemm_universal.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/15_ampere_sparse_tensorop_gemm/ampere_sparse_tensorop_gemm_universal.cu -o examples/15_ampere_sparse_tensorop_gemm/CMakeFiles/15_ampere_sparse_tensorop_gemm_universal.dir/ampere_sparse_tensorop_gemm_universal.cu.o | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h: In Instanziierung von »Element cutlass::reference::host::detail::RandomUniformFunc<Element>::operator()() [mit Element = cutlass::integer_subbyte<4, true>]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:854:26: erfordert durch »void cutlass::reference::host::detail::TensorFillRandomUniformFunc<Element, Layout>::operator()(const cutlass::Coord<Layout::kRank>&) [mit Element = cutlass::integer_subbyte<4, true>; Layout = cutlass::layout::RowMajor]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_foreach.h:80:5: erfordert durch »cutlass::reference::host::detail::TensorForEachHelper<Func, Rank, 0>::TensorForEachHelper(Func&, const cutlass::Coord<Rank>&, cutlass::Coord<Rank>&) [mit Func = cutlass::reference::host::detail::TensorFillRandomUniformFunc<cutlass::integer_subbyte<4, true>, cutlass::layout::RowMajor>; int Rank = 2]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_foreach.h:60:1: erfordert durch »cutlass::reference::host::detail::TensorForEachHelper<Func, Rank, RankRemaining>::TensorForEachHelper(Func&, const cutlass::Coord<Rank>&, cutlass::Coord<Rank>&) [mit Func = cutlass::reference::host::detail::TensorFillRandomUniformFunc<cutlass::integer_subbyte<4, true>, cutlass::layout::RowMajor>; int Rank = 2; int RankRemaining = 1]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_foreach.h:95:9: erfordert durch »void cutlass::reference::host::TensorForEach(cutlass::Coord<Rank>, Func&) [mit Func = detail::TensorFillRandomUniformFunc<cutlass::integer_subbyte<4, true>, cutlass::layout::RowMajor>; int Rank = 2]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:977:14: erfordert durch »void cutlass::reference::host::TensorFillRandomUniform(cutlass::TensorView<Element, Layout>, uint64_t, double, double, int, double, bool) [mit Element = cutlass::integer_subbyte<4, true>; Layout = cutlass::layout::RowMajor; uint64_t = long unsigned int]« | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/15_ampere_sparse_tensorop_gemm/ampere_sparse_tensorop_gemm_universal.cu:165:50: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:642:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
642 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:645:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
645 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/host/tensor_fill.h:654:33: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
654 | result = static_cast<Element>(Real(rnd)); | |
| ^~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
[2794/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/tools/library/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIC -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e5m2a_e5m2out.cu.o -MF tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e5m2a_e5m2out.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src/reference/gemm_e5m2a_e5m2out.cu -o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e5m2a_e5m2out.cu.o | |
[2795/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/17_fprop_per_channel_bias/CMakeFiles/17_fprop_per_channel_bias.dir/fprop_per_channel_bias.cu.o -MF examples/17_fprop_per_channel_bias/CMakeFiles/17_fprop_per_channel_bias.dir/fprop_per_channel_bias.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/17_fprop_per_channel_bias/fprop_per_channel_bias.cu -o examples/17_fprop_per_channel_bias/CMakeFiles/17_fprop_per_channel_bias.dir/fprop_per_channel_bias.cu.o | |
[2796/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/19_tensorop_canonical/CMakeFiles/19_tensorop_canonical.dir/tensorop_canonical.cu.o -MF examples/19_tensorop_canonical/CMakeFiles/19_tensorop_canonical.dir/tensorop_canonical.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/19_tensorop_canonical/tensorop_canonical.cu -o examples/19_tensorop_canonical/CMakeFiles/19_tensorop_canonical.dir/tensorop_canonical.cu.o | |
[2797/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/16_ampere_tensorop_conv2dfprop/CMakeFiles/16_ampere_tensorop_conv2dfprop.dir/ampere_tensorop_conv2dfprop.cu.o -MF examples/16_ampere_tensorop_conv2dfprop/CMakeFiles/16_ampere_tensorop_conv2dfprop.dir/ampere_tensorop_conv2dfprop.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/16_ampere_tensorop_conv2dfprop/ampere_tensorop_conv2dfprop.cu -o examples/16_ampere_tensorop_conv2dfprop/CMakeFiles/16_ampere_tensorop_conv2dfprop.dir/ampere_tensorop_conv2dfprop.cu.o | |
[2798/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/20_simt_canonical/CMakeFiles/20_simt_canonical.dir/simt_canonical.cu.o -MF examples/20_simt_canonical/CMakeFiles/20_simt_canonical.dir/simt_canonical.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/20_simt_canonical/simt_canonical.cu -o examples/20_simt_canonical/CMakeFiles/20_simt_canonical.dir/simt_canonical.cu.o | |
[2799/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/18_ampere_fp64_tensorop_affine2_gemm/CMakeFiles/18_ampere_fp64_tensorop_affine2_gemm.dir/ampere_fp64_tensorop_affine2_gemm.cu.o -MF examples/18_ampere_fp64_tensorop_affine2_gemm/CMakeFiles/18_ampere_fp64_tensorop_affine2_gemm.dir/ampere_fp64_tensorop_affine2_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/18_ampere_fp64_tensorop_affine2_gemm/ampere_fp64_tensorop_affine2_gemm.cu -o examples/18_ampere_fp64_tensorop_affine2_gemm/CMakeFiles/18_ampere_fp64_tensorop_affine2_gemm.dir/ampere_fp64_tensorop_affine2_gemm.cu.o | |
[2800/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/21_quaternion_gemm/CMakeFiles/21_quaternion_gemm.dir/quaternion_gemm.cu.o -MF examples/21_quaternion_gemm/CMakeFiles/21_quaternion_gemm.dir/quaternion_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/21_quaternion_gemm/quaternion_gemm.cu -o examples/21_quaternion_gemm/CMakeFiles/21_quaternion_gemm.dir/quaternion_gemm.cu.o | |
[2801/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/15_ampere_sparse_tensorop_gemm/CMakeFiles/15_ampere_sparse_tensorop_gemm_with_visitor.dir/ampere_sparse_tensorop_gemm_with_visitor.cu.o -MF examples/15_ampere_sparse_tensorop_gemm/CMakeFiles/15_ampere_sparse_tensorop_gemm_with_visitor.dir/ampere_sparse_tensorop_gemm_with_visitor.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/15_ampere_sparse_tensorop_gemm/ampere_sparse_tensorop_gemm_with_visitor.cu -o examples/15_ampere_sparse_tensorop_gemm/CMakeFiles/15_ampere_sparse_tensorop_gemm_with_visitor.dir/ampere_sparse_tensorop_gemm_with_visitor.cu.o | |
[2802/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/22_quaternion_conv/CMakeFiles/22_quaternion_conv.dir/quaternion_conv.cu.o -MF examples/22_quaternion_conv/CMakeFiles/22_quaternion_conv.dir/quaternion_conv.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/22_quaternion_conv/quaternion_conv.cu -o examples/22_quaternion_conv/CMakeFiles/22_quaternion_conv.dir/quaternion_conv.cu.o | |
[2803/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/23_ampere_gemm_operand_reduction_fusion/CMakeFiles/23_ampere_gemm_operand_reduction_fusion.dir/ampere_gemm_operand_reduction_fusion.cu.o -MF examples/23_ampere_gemm_operand_reduction_fusion/CMakeFiles/23_ampere_gemm_operand_reduction_fusion.dir/ampere_gemm_operand_reduction_fusion.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/23_ampere_gemm_operand_reduction_fusion/ampere_gemm_operand_reduction_fusion.cu -o examples/23_ampere_gemm_operand_reduction_fusion/CMakeFiles/23_ampere_gemm_operand_reduction_fusion.dir/ampere_gemm_operand_reduction_fusion.cu.o | |
[2804/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/25_ampere_fprop_mainloop_fusion/CMakeFiles/25_ampere_fprop_mainloop_fusion.dir/ampere_fprop_mainloop_fusion.cu.o -MF examples/25_ampere_fprop_mainloop_fusion/CMakeFiles/25_ampere_fprop_mainloop_fusion.dir/ampere_fprop_mainloop_fusion.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/25_ampere_fprop_mainloop_fusion/ampere_fprop_mainloop_fusion.cu -o examples/25_ampere_fprop_mainloop_fusion/CMakeFiles/25_ampere_fprop_mainloop_fusion.dir/ampere_fprop_mainloop_fusion.cu.o | |
[2805/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/25_ampere_fprop_mainloop_fusion/CMakeFiles/25_ampere_3d_fprop_mainloop_fusion.dir/ampere_3d_fprop_mainloop_fusion.cu.o -MF examples/25_ampere_fprop_mainloop_fusion/CMakeFiles/25_ampere_3d_fprop_mainloop_fusion.dir/ampere_3d_fprop_mainloop_fusion.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/25_ampere_fprop_mainloop_fusion/ampere_3d_fprop_mainloop_fusion.cu -o examples/25_ampere_fprop_mainloop_fusion/CMakeFiles/25_ampere_3d_fprop_mainloop_fusion.dir/ampere_3d_fprop_mainloop_fusion.cu.o | |
[2806/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/31_basic_syrk/CMakeFiles/31_basic_syrk.dir/basic_syrk.cu.o -MF examples/31_basic_syrk/CMakeFiles/31_basic_syrk.dir/basic_syrk.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/31_basic_syrk/basic_syrk.cu -o examples/31_basic_syrk/CMakeFiles/31_basic_syrk.dir/basic_syrk.cu.o | |
[2807/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/26_ampere_wgrad_mainloop_fusion/CMakeFiles/26_ampere_wgrad_mainloop_fusion.dir/ampere_wgrad_mainloop_fusion.cu.o -MF examples/26_ampere_wgrad_mainloop_fusion/CMakeFiles/26_ampere_wgrad_mainloop_fusion.dir/ampere_wgrad_mainloop_fusion.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/26_ampere_wgrad_mainloop_fusion/ampere_wgrad_mainloop_fusion.cu -o examples/26_ampere_wgrad_mainloop_fusion/CMakeFiles/26_ampere_wgrad_mainloop_fusion.dir/ampere_wgrad_mainloop_fusion.cu.o | |
[2808/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/27_ampere_3xtf32_fast_accurate_tensorop_gemm/CMakeFiles/27_ampere_3xtf32_fast_accurate_tensorop_gemm.dir/27_ampere_3xtf32_fast_accurate_tensorop_gemm.cu.o -MF examples/27_ampere_3xtf32_fast_accurate_tensorop_gemm/CMakeFiles/27_ampere_3xtf32_fast_accurate_tensorop_gemm.dir/27_ampere_3xtf32_fast_accurate_tensorop_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/27_ampere_3xtf32_fast_accurate_tensorop_gemm/27_ampere_3xtf32_fast_accurate_tensorop_gemm.cu -o examples/27_ampere_3xtf32_fast_accurate_tensorop_gemm/CMakeFiles/27_ampere_3xtf32_fast_accurate_tensorop_gemm.dir/27_ampere_3xtf32_fast_accurate_tensorop_gemm.cu.o | |
[2809/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/30_wgrad_split_k/CMakeFiles/30_wgrad_split_k.dir/30_wgrad_split_k.cu.o -MF examples/30_wgrad_split_k/CMakeFiles/30_wgrad_split_k.dir/30_wgrad_split_k.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/30_wgrad_split_k/30_wgrad_split_k.cu -o examples/30_wgrad_split_k/CMakeFiles/30_wgrad_split_k.dir/30_wgrad_split_k.cu.o | |
[2810/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/32_basic_trmm/CMakeFiles/32_basic_trmm.dir/basic_trmm.cu.o -MF examples/32_basic_trmm/CMakeFiles/32_basic_trmm.dir/basic_trmm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/32_basic_trmm/basic_trmm.cu -o examples/32_basic_trmm/CMakeFiles/32_basic_trmm.dir/basic_trmm.cu.o | |
[2811/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/28_ampere_3xtf32_fast_accurate_tensorop_fprop/CMakeFiles/28_ampere_3xtf32_fast_accurate_tensorop_fprop.dir/ampere_3xtf32_fast_accurate_tensorop_fprop.cu.o -MF examples/28_ampere_3xtf32_fast_accurate_tensorop_fprop/CMakeFiles/28_ampere_3xtf32_fast_accurate_tensorop_fprop.dir/ampere_3xtf32_fast_accurate_tensorop_fprop.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/28_ampere_3xtf32_fast_accurate_tensorop_fprop/ampere_3xtf32_fast_accurate_tensorop_fprop.cu -o examples/28_ampere_3xtf32_fast_accurate_tensorop_fprop/CMakeFiles/28_ampere_3xtf32_fast_accurate_tensorop_fprop.dir/ampere_3xtf32_fast_accurate_tensorop_fprop.cu.o | |
[2812/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/24_gemm_grouped/CMakeFiles/24_gemm_grouped.dir/gemm_grouped.cu.o -MF examples/24_gemm_grouped/CMakeFiles/24_gemm_grouped.dir/gemm_grouped.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/24_gemm_grouped/gemm_grouped.cu -o examples/24_gemm_grouped/CMakeFiles/24_gemm_grouped.dir/gemm_grouped.cu.o | |
[2813/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/34_transposed_conv2d/CMakeFiles/34_transposed_conv2d.dir/34_transposed_conv2d.cu.o -MF examples/34_transposed_conv2d/CMakeFiles/34_transposed_conv2d.dir/34_transposed_conv2d.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/34_transposed_conv2d/34_transposed_conv2d.cu -o examples/34_transposed_conv2d/CMakeFiles/34_transposed_conv2d.dir/34_transposed_conv2d.cu.o | |
[2814/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/29_ampere_3xtf32_fast_accurate_tensorop_complex_gemm/CMakeFiles/29_3xtf32_complex_gemm.dir/29_3xtf32_complex_gemm.cu.o -MF examples/29_ampere_3xtf32_fast_accurate_tensorop_complex_gemm/CMakeFiles/29_3xtf32_complex_gemm.dir/29_3xtf32_complex_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/29_ampere_3xtf32_fast_accurate_tensorop_complex_gemm/29_3xtf32_complex_gemm.cu -o examples/29_ampere_3xtf32_fast_accurate_tensorop_complex_gemm/CMakeFiles/29_3xtf32_complex_gemm.dir/29_3xtf32_complex_gemm.cu.o | |
[2815/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/33_ampere_3xtf32_tensorop_symm/CMakeFiles/33_ampere_3xtf32_tensorop_symm.dir/ampere_3xtf32_tensorop_symm.cu.o -MF examples/33_ampere_3xtf32_tensorop_symm/CMakeFiles/33_ampere_3xtf32_tensorop_symm.dir/ampere_3xtf32_tensorop_symm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/33_ampere_3xtf32_tensorop_symm/ampere_3xtf32_tensorop_symm.cu -o examples/33_ampere_3xtf32_tensorop_symm/CMakeFiles/33_ampere_3xtf32_tensorop_symm.dir/ampere_3xtf32_tensorop_symm.cu.o | |
[2816/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/35_gemm_softmax/CMakeFiles/35_gemm_softmax.dir/gemm_softmax.cu.o -MF examples/35_gemm_softmax/CMakeFiles/35_gemm_softmax.dir/gemm_softmax.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/35_gemm_softmax/gemm_softmax.cu -o examples/35_gemm_softmax/CMakeFiles/35_gemm_softmax.dir/gemm_softmax.cu.o | |
[2817/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/36_gather_scatter_fusion/CMakeFiles/36_gather_scatter_fusion.dir/gather_scatter_fusion.cu.o -MF examples/36_gather_scatter_fusion/CMakeFiles/36_gather_scatter_fusion.dir/gather_scatter_fusion.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/36_gather_scatter_fusion/gather_scatter_fusion.cu -o examples/36_gather_scatter_fusion/CMakeFiles/36_gather_scatter_fusion.dir/gather_scatter_fusion.cu.o | |
[2818/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/37_gemm_layernorm_gemm_fusion/CMakeFiles/37_gemm_layernorm_gemm_fusion.dir/gemm_layernorm.cu.o -MF examples/37_gemm_layernorm_gemm_fusion/CMakeFiles/37_gemm_layernorm_gemm_fusion.dir/gemm_layernorm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/37_gemm_layernorm_gemm_fusion/gemm_layernorm.cu -o examples/37_gemm_layernorm_gemm_fusion/CMakeFiles/37_gemm_layernorm_gemm_fusion.dir/gemm_layernorm.cu.o | |
[2819/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/38_syr2k_grouped/CMakeFiles/38_syr2k_grouped.dir/syr2k_grouped.cu.o -MF examples/38_syr2k_grouped/CMakeFiles/38_syr2k_grouped.dir/syr2k_grouped.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/38_syr2k_grouped/syr2k_grouped.cu -o examples/38_syr2k_grouped/CMakeFiles/38_syr2k_grouped.dir/syr2k_grouped.cu.o | |
[2820/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/cute/tutorial/CMakeFiles/cute_tutorial_tiled_copy.dir/tiled_copy.cu.o -MF examples/cute/tutorial/CMakeFiles/cute_tutorial_tiled_copy.dir/tiled_copy.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/cute/tutorial/tiled_copy.cu -o examples/cute/tutorial/CMakeFiles/cute_tutorial_tiled_copy.dir/tiled_copy.cu.o | |
[2821/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/42_ampere_tensorop_group_conv/CMakeFiles/42_ampere_tensorop_group_conv.dir/ampere_tensorop_group_conv.cu.o -MF examples/42_ampere_tensorop_group_conv/CMakeFiles/42_ampere_tensorop_group_conv.dir/ampere_tensorop_group_conv.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/42_ampere_tensorop_group_conv/ampere_tensorop_group_conv.cu -o examples/42_ampere_tensorop_group_conv/CMakeFiles/42_ampere_tensorop_group_conv.dir/ampere_tensorop_group_conv.cu.o | |
[2822/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/cute/tutorial/CMakeFiles/cute_tutorial_tiled_copy_if.dir/tiled_copy_if.cu.o -MF examples/cute/tutorial/CMakeFiles/cute_tutorial_tiled_copy_if.dir/tiled_copy_if.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/cute/tutorial/tiled_copy_if.cu -o examples/cute/tutorial/CMakeFiles/cute_tutorial_tiled_copy_if.dir/tiled_copy_if.cu.o | |
[2823/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_1.dir/sgemm_1.cu.o -MF examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_1.dir/sgemm_1.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/cute/tutorial/sgemm_1.cu -o examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_1.dir/sgemm_1.cu.o | |
[2824/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/43_ell_block_sparse_gemm/CMakeFiles/43_ell_block_sparse_gemm.dir/ell_block_sparse_gemm.cu.o -MF examples/43_ell_block_sparse_gemm/CMakeFiles/43_ell_block_sparse_gemm.dir/ell_block_sparse_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/43_ell_block_sparse_gemm/ell_block_sparse_gemm.cu -o examples/43_ell_block_sparse_gemm/CMakeFiles/43_ell_block_sparse_gemm.dir/ell_block_sparse_gemm.cu.o | |
[2825/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_sm70.dir/sgemm_sm70.cu.o -MF examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_sm70.dir/sgemm_sm70.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/cute/tutorial/sgemm_sm70.cu -o examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_sm70.dir/sgemm_sm70.cu.o | |
[2826/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_2.dir/sgemm_2.cu.o -MF examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_2.dir/sgemm_2.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/cute/tutorial/sgemm_2.cu -o examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_2.dir/sgemm_2.cu.o | |
[2827/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_fixed_seqlen.dir/fused_multihead_attention_fixed_seqlen.cu.o -MF examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_fixed_seqlen.dir/fused_multihead_attention_fixed_seqlen.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/41_fused_multi_head_attention/fused_multihead_attention_fixed_seqlen.cu -o examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_fixed_seqlen.dir/fused_multihead_attention_fixed_seqlen.cu.o | |
[2828/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/tools/library/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIC -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e4m3a_e4m3out.cu.o -MF tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e4m3a_e4m3out.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src/reference/gemm_e4m3a_e4m3out.cu -o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e4m3a_e4m3out.cu.o | |
[2829/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/46_depthwise_simt_conv2dfprop/CMakeFiles/46_depthwise_simt_conv2dfprop.dir/depthwise_simt_conv2dfprop.cu.o -MF examples/46_depthwise_simt_conv2dfprop/CMakeFiles/46_depthwise_simt_conv2dfprop.dir/depthwise_simt_conv2dfprop.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/46_depthwise_simt_conv2dfprop/depthwise_simt_conv2dfprop.cu -o examples/46_depthwise_simt_conv2dfprop/CMakeFiles/46_depthwise_simt_conv2dfprop.dir/depthwise_simt_conv2dfprop.cu.o | |
[2830/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/cute/tutorial/hopper/CMakeFiles/cute_tutorial_wgmma_tma_sm90.dir/wgmma_tma_sm90.cu.o -MF examples/cute/tutorial/hopper/CMakeFiles/cute_tutorial_wgmma_tma_sm90.dir/wgmma_tma_sm90.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/cute/tutorial/hopper/wgmma_tma_sm90.cu -o examples/cute/tutorial/hopper/CMakeFiles/cute_tutorial_wgmma_tma_sm90.dir/wgmma_tma_sm90.cu.o | |
[2831/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/cute/tutorial/hopper/CMakeFiles/cute_tutorial_wgmma_sm90.dir/wgmma_sm90.cu.o -MF examples/cute/tutorial/hopper/CMakeFiles/cute_tutorial_wgmma_sm90.dir/wgmma_sm90.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/cute/tutorial/hopper/wgmma_sm90.cu -o examples/cute/tutorial/hopper/CMakeFiles/cute_tutorial_wgmma_sm90.dir/wgmma_sm90.cu.o | |
[2832/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/47_ampere_gemm_universal_streamk/CMakeFiles/47_ampere_gemm_universal_streamk.dir/ampere_gemm_universal_streamk.cu.o -MF examples/47_ampere_gemm_universal_streamk/CMakeFiles/47_ampere_gemm_universal_streamk.dir/ampere_gemm_universal_streamk.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/47_ampere_gemm_universal_streamk/ampere_gemm_universal_streamk.cu -o examples/47_ampere_gemm_universal_streamk/CMakeFiles/47_ampere_gemm_universal_streamk.dir/ampere_gemm_universal_streamk.cu.o | |
[2833/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_variable_seqlen.dir/fused_multihead_attention_variable_seqlen.cu.o -MF examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_variable_seqlen.dir/fused_multihead_attention_variable_seqlen.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/41_fused_multi_head_attention/fused_multihead_attention_variable_seqlen.cu -o examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_variable_seqlen.dir/fused_multihead_attention_variable_seqlen.cu.o | |
[2834/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_sm80.dir/sgemm_sm80.cu.o -MF examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_sm80.dir/sgemm_sm80.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/cute/tutorial/sgemm_sm80.cu -o examples/cute/tutorial/CMakeFiles/cute_tutorial_sgemm_sm80.dir/sgemm_sm80.cu.o | |
[2835/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/45_dual_gemm/CMakeFiles/45_dual_gemm.dir/dual_gemm.cu.o -MF examples/45_dual_gemm/CMakeFiles/45_dual_gemm.dir/dual_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/45_dual_gemm/dual_gemm.cu -o examples/45_dual_gemm/CMakeFiles/45_dual_gemm.dir/dual_gemm.cu.o | |
[2836/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/48_hopper_warp_specialized_gemm/CMakeFiles/48_hopper_warp_specialized_gemm.dir/48_hopper_warp_specialized_gemm.cu.o -MF examples/48_hopper_warp_specialized_gemm/CMakeFiles/48_hopper_warp_specialized_gemm.dir/48_hopper_warp_specialized_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/48_hopper_warp_specialized_gemm/48_hopper_warp_specialized_gemm.cu -o examples/48_hopper_warp_specialized_gemm/CMakeFiles/48_hopper_warp_specialized_gemm.dir/48_hopper_warp_specialized_gemm.cu.o | |
[2837/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/50_hopper_gemm_with_epilogue_swizzle/CMakeFiles/50_hopper_gemm_with_epilogue_swizzle.dir/50_hopper_gemm_with_epilogue_swizzle.cu.o -MF examples/50_hopper_gemm_with_epilogue_swizzle/CMakeFiles/50_hopper_gemm_with_epilogue_swizzle.dir/50_hopper_gemm_with_epilogue_swizzle.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/50_hopper_gemm_with_epilogue_swizzle/50_hopper_gemm_with_epilogue_swizzle.cu -o examples/50_hopper_gemm_with_epilogue_swizzle/CMakeFiles/50_hopper_gemm_with_epilogue_swizzle.dir/50_hopper_gemm_with_epilogue_swizzle.cu.o | |
[2838/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/47_ampere_gemm_universal_streamk/CMakeFiles/47_ampere_gemm_universal_streamk_broadcast.dir/ampere_gemm_universal_streamk_broadcast.cu.o -MF examples/47_ampere_gemm_universal_streamk/CMakeFiles/47_ampere_gemm_universal_streamk_broadcast.dir/ampere_gemm_universal_streamk_broadcast.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/47_ampere_gemm_universal_streamk/ampere_gemm_universal_streamk_broadcast.cu -o examples/47_ampere_gemm_universal_streamk/CMakeFiles/47_ampere_gemm_universal_streamk_broadcast.dir/ampere_gemm_universal_streamk_broadcast.cu.o | |
[2839/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/tools/library/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIC -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_fp_mixed_input.cu.o -MF tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_fp_mixed_input.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/library/src/reference/gemm_fp_mixed_input.cu -o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_fp_mixed_input.cu.o | |
[2840/3834] : && /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -fPIC -O3 -march=native -fno-math-errno -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/cuDNN/9.5.0.50-CUDA-12.6.0/lib64 -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/cuDNN/9.5.0.50-CUDA-12.6.0/lib -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib64 -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/Python/3.12.3-GCCcore-13.3.0/lib64 -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/Python/3.12.3-GCCcore-13.3.0/lib -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/FFTW/3.3.10-GCC-13.3.0/lib64 -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/FFTW/3.3.10-GCC-13.3.0/lib -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/ScaLAPACK/2.2.0-gompi-2024a-fb/lib64 -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/ScaLAPACK/2.2.0-gompi-2024a-fb/lib -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/FlexiBLAS/3.4.4-GCC-13.3.0/lib64 -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/FlexiBLAS/3.4.4-GCC-13.3.0/lib -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/lib64 -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/lib -shared -Wl,-soname,libcutlass.so -o tools/library/libcutlass.so tools/library/CMakeFiles/cutlass_library_objs.dir/src/handle.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/manifest.cpp.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/operation_table.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/singleton.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/util.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_int4.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/block_scaled_gemm_fp4a_vs16.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/block_scaled_gemm_fp4a_vs32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/block_scaled_gemm_mixed8bitsa.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f4_f4_f32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f4_f6_f32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f4_f8_f32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f6_f4_f32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f6_f6_f32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f6_f8_f32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f8_f4_f32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_f8_f6_f32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/blockwise_gemm_fp8_fp16out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/blockwise_gemm_fp8_fp32out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/blockwise_gemm_fp8_bf16out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_s8_s8_s32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_u8_u8_s32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_int8_interleaved_32.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_int8_interleaved_64.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e4m3a_e4m3out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e5m2a_e4m3out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e4m3a_e5m2out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_e5m2a_e5m2out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_fp8in_fp16out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_fp8in_bf16out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_fp8in_fp32out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_fp32out.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_fp_other.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_fp_mixed_input.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/gemm_int_mixed_input.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/initialize_reference_operations.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reduction/reduction_device.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reduction/init_reduction_operations.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/conv2d.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/src/reference/conv3d.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/generated/initialize_all.cpp.o tools/library/CMakeFiles/cutlass_library_objs.dir/generated/gemm/all_gemm_operations.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/generated/conv2d/all_conv2d_operations.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/generated/conv3d/all_conv3d_operations.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/generated/rank_k/all_rank_k_operations.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/generated/rank_2k/all_rank_2k_operations.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/generated/trmm/all_trmm_operations.cu.o tools/library/CMakeFiles/cutlass_library_objs.dir/generated/symm/all_symm_operations.cu.o -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib64/stubs -L/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/targets/x86_64-linux/lib/stubs -Wl,-rpath,/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib64/stubs:/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/tools/library: -Wl,-rpath,'' -Wl,-rpath,'/../lib64' -Wl,-rpath,'/../lib' -Wl,-rpath,'/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib64' -Wl,-rpath,'/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib' -ldl tools/library/libcutlass_gemm_sm50_cgemm.so tools/library/libcutlass_gemm_sm50_dgemm.so tools/library/libcutlass_gemm_sm50_sgemm.so tools/library/libcutlass_gemm_sm60_hgemm.so tools/library/libcutlass_gemm_sm61_igemm_s8.so tools/library/libcutlass_gemm_sm61_s8_igemm_s8.so tools/library/libcutlass_gemm_sm70_f16_s884gemm_f16.so tools/library/libcutlass_gemm_sm70_f16_s884gemm_planar_complex_array_f16.so tools/library/libcutlass_gemm_sm70_f16_s884gemm_planar_complex_f16.so tools/library/libcutlass_gemm_sm70_h884gemm.so tools/library/libcutlass_gemm_sm70_h884gemm_planar_complex.so tools/library/libcutlass_gemm_sm70_h884gemm_planar_complex_array.so tools/library/libcutlass_gemm_sm70_s884gemm_f16.so tools/library/libcutlass_gemm_sm70_s884gemm_planar_complex_array_f16.so tools/library/libcutlass_gemm_sm70_s884gemm_planar_complex_f16.so tools/library/libcutlass_gemm_sm75_f16_s1688gemm_f16.so tools/library/libcutlass_gemm_sm75_f16_s1688gemm_planar_complex_array_f16.so tools/library/libcutlass_gemm_sm75_f16_s1688gemm_planar_complex_f16.so tools/library/libcutlass_gemm_sm75_h1688gemm.so tools/library/libcutlass_gemm_sm75_h1688gemm_planar_complex.so tools/library/libcutlass_gemm_sm75_h1688gemm_planar_complex_array.so tools/library/libcutlass_gemm_sm75_i88128xorgemm_b1.so tools/library/libcutlass_gemm_sm75_i8816gemm_s8.so tools/library/libcutlass_gemm_sm75_i8816gemm_u8.so tools/library/libcutlass_gemm_sm75_i8832gemm_s4.so tools/library/libcutlass_gemm_sm75_i8832gemm_u4.so tools/library/libcutlass_gemm_sm75_s1688gemm_f16.so tools/library/libcutlass_gemm_sm75_s1688gemm_planar_complex_array_f16.so tools/library/libcutlass_gemm_sm75_s1688gemm_planar_complex_f16.so tools/library/libcutlass_gemm_sm75_s4_i8832gemm_s4.so tools/library/libcutlass_gemm_sm75_s8_i8816gemm_s8.so tools/library/libcutlass_gemm_sm75_u4_i8832gemm_u4.so tools/library/libcutlass_gemm_sm75_u8_i8816gemm_u8.so tools/library/libcutlass_gemm_sm80_bf16_s16816gemm_bf16.so tools/library/libcutlass_gemm_sm80_bf16_s16816gemm_bf16_s8.so tools/library/libcutlass_gemm_sm80_bf16_s16816gemm_bf16_u8.so tools/library/libcutlass_gemm_sm80_bf16_s16816gemm_planar_complex_array_bf16.so tools/library/libcutlass_gemm_sm80_bf16_s16816gemm_planar_complex_bf16.so tools/library/libcutlass_gemm_sm80_bf16_s16816gemm_s8_bf16.so tools/library/libcutlass_gemm_sm80_bf16_s16816gemm_u8_bf16.so tools/library/libcutlass_gemm_sm80_bf16_s16832spgemm_bf16.so tools/library/libcutlass_gemm_sm80_c1688gemm.so tools/library/libcutlass_gemm_sm80_c1688tf32gemm.so tools/library/libcutlass_gemm_sm80_cgemm.so tools/library/libcutlass_gemm_sm80_d884gemm.so tools/library/libcutlass_gemm_sm80_dgemm.so tools/library/libcutlass_gemm_sm80_f16_s16816gemm_f16.so tools/library/libcutlass_gemm_sm80_f16_s16816gemm_f16_s8.so tools/library/libcutlass_gemm_sm80_f16_s16816gemm_f16_u8.so tools/library/libcutlass_gemm_sm80_f16_s16816gemm_planar_complex_array_f16.so tools/library/libcutlass_gemm_sm80_f16_s16816gemm_planar_complex_f16.so tools/library/libcutlass_gemm_sm80_f16_s16816gemm_s8_f16.so tools/library/libcutlass_gemm_sm80_f16_s16816gemm_u8_f16.so tools/library/libcutlass_gemm_sm80_f16_s16832spgemm_f16.so tools/library/libcutlass_gemm_sm80_gz884gemm.so tools/library/libcutlass_gemm_sm80_h16816gemm.so tools/library/libcutlass_gemm_sm80_h16816gemm_f16_s8.so tools/library/libcutlass_gemm_sm80_h16816gemm_f16_u8.so tools/library/libcutlass_gemm_sm80_h16816gemm_grouped.so tools/library/libcutlass_gemm_sm80_h16816gemm_planar_complex.so tools/library/libcutlass_gemm_sm80_h16816gemm_planar_complex_array.so tools/library/libcutlass_gemm_sm80_h16816gemm_s8_f16.so tools/library/libcutlass_gemm_sm80_h16816gemm_u8_f16.so tools/library/libcutlass_gemm_sm80_h16832spgemm.so tools/library/libcutlass_gemm_sm80_i168128spgemm_s4.so tools/library/libcutlass_gemm_sm80_i168256andgemm_b1.so tools/library/libcutlass_gemm_sm80_i168256xorgemm_b1.so tools/library/libcutlass_gemm_sm80_i16832gemm_s4_s8.so tools/library/libcutlass_gemm_sm80_i16832gemm_s8.so tools/library/libcutlass_gemm_sm80_i16832gemm_s8_s4.so tools/library/libcutlass_gemm_sm80_i16832gemm_u8.so tools/library/libcutlass_gemm_sm80_i16864gemm_s4.so tools/library/libcutlass_gemm_sm80_i16864gemm_u4.so tools/library/libcutlass_gemm_sm80_i16864spgemm_s8.so tools/library/libcutlass_gemm_sm80_s16816gemm_bf16.so tools/library/libcutlass_gemm_sm80_s16816gemm_bf16_s8.so tools/library/libcutlass_gemm_sm80_s16816gemm_bf16_u8.so tools/library/libcutlass_gemm_sm80_s16816gemm_f16.so tools/library/libcutlass_gemm_sm80_s16816gemm_f16_s8.so tools/library/libcutlass_gemm_sm80_s16816gemm_f16_u8.so tools/library/libcutlass_gemm_sm80_s16816gemm_grouped_bf16.so tools/library/libcutlass_gemm_sm80_s16816gemm_grouped_f16.so tools/library/libcutlass_gemm_sm80_s16816gemm_planar_complex_array_bf16.so tools/library/libcutlass_gemm_sm80_s16816gemm_planar_complex_array_f16.so tools/library/libcutlass_gemm_sm80_s16816gemm_planar_complex_bf16.so tools/library/libcutlass_gemm_sm80_s16816gemm_planar_complex_f16.so tools/library/libcutlass_gemm_sm80_s16816gemm_s8_bf16.so tools/library/libcutlass_gemm_sm80_s16816gemm_s8_f16.so tools/library/libcutlass_gemm_sm80_s16816gemm_u8_bf16.so tools/library/libcutlass_gemm_sm80_s16816gemm_u8_f16.so tools/library/libcutlass_gemm_sm80_s16816tf32spgemm.so tools/library/libcutlass_gemm_sm80_s16832spgemm_bf16.so tools/library/libcutlass_gemm_sm80_s16832spgemm_f16.so tools/library/libcutlass_gemm_sm80_s1688bf16gemm.so tools/library/libcutlass_gemm_sm80_s1688f16gemm.so tools/library/libcutlass_gemm_sm80_s1688gemm.so tools/library/libcutlass_gemm_sm80_s1688gemm_tf32.so tools/library/libcutlass_gemm_sm80_s1688tf32gemm.so tools/library/libcutlass_gemm_sm80_s4_i168128spgemm_s4.so tools/library/libcutlass_gemm_sm80_s4_i16864gemm_s4.so tools/library/libcutlass_gemm_sm80_s8_i16832gemm_s4_s8.so tools/library/libcutlass_gemm_sm80_s8_i16832gemm_s8.so tools/library/libcutlass_gemm_sm80_s8_i16832gemm_s8_s4.so tools/library/libcutlass_gemm_sm80_s8_i16864spgemm_s8.so tools/library/libcutlass_gemm_sm80_sgemm.so tools/library/libcutlass_gemm_sm80_tf32_s1688gemm_tf32.so tools/library/libcutlass_gemm_sm80_u4_i16864gemm_u4.so tools/library/libcutlass_gemm_sm80_u8_i16832gemm_u8.so tools/library/libcutlass_gemm_sm80_z884gemm.so tools/library/libcutlass_conv2d_sm50_cf32_cdgrad_optimized_cf32.so tools/library/libcutlass_conv2d_sm50_cf32_cfprop_optimized_cf32.so tools/library/libcutlass_conv2d_sm50_cf32_cwgrad_optimized_cf32.so tools/library/libcutlass_conv2d_sm50_sdgrad_optimized.so tools/library/libcutlass_conv2d_sm50_sfprop_optimized.so tools/library/libcutlass_conv2d_sm50_swgrad_optimized.so tools/library/libcutlass_conv2d_sm60_hfprop_optimized.so tools/library/libcutlass_conv2d_sm70_f16_s884dgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm70_f16_s884fprop_optimized_f16.so tools/library/libcutlass_conv2d_sm70_f16_s884wgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm70_h884dgrad_optimized.so tools/library/libcutlass_conv2d_sm70_h884fprop_optimized.so tools/library/libcutlass_conv2d_sm70_h884wgrad_optimized.so tools/library/libcutlass_conv2d_sm70_s884dgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm70_s884fprop_optimized_f16.so tools/library/libcutlass_conv2d_sm70_s884wgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm75_cf32_cdgrad_optimized_cf32.so tools/library/libcutlass_conv2d_sm75_cf32_cfprop_optimized_cf32.so tools/library/libcutlass_conv2d_sm75_cf32_cwgrad_optimized_cf32.so tools/library/libcutlass_conv2d_sm75_f16_s1688dgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm75_f16_s1688fprop_few_channels_f16.so tools/library/libcutlass_conv2d_sm75_f16_s1688fprop_fixed_channels_f16.so tools/library/libcutlass_conv2d_sm75_f16_s1688fprop_optimized_f16.so tools/library/libcutlass_conv2d_sm75_f16_s1688wgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm75_h1688dgrad_optimized.so tools/library/libcutlass_conv2d_sm75_h1688fprop_few_channels.so tools/library/libcutlass_conv2d_sm75_h1688fprop_fixed_channels.so tools/library/libcutlass_conv2d_sm75_h1688fprop_optimized.so tools/library/libcutlass_conv2d_sm75_h1688wgrad_optimized.so tools/library/libcutlass_conv2d_sm75_i8816fprop_optimized_s8.so tools/library/libcutlass_conv2d_sm75_i8816fprop_optimized_u8.so tools/library/libcutlass_conv2d_sm75_i8832fprop_optimized_s4.so tools/library/libcutlass_conv2d_sm75_i8832fprop_optimized_u4.so tools/library/libcutlass_conv2d_sm75_s1688dgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm75_s1688fprop_few_channels_f16.so tools/library/libcutlass_conv2d_sm75_s1688fprop_fixed_channels_f16.so tools/library/libcutlass_conv2d_sm75_s1688fprop_optimized_f16.so tools/library/libcutlass_conv2d_sm75_s1688wgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm75_s4_i8832fprop_optimized_s4.so tools/library/libcutlass_conv2d_sm75_s8_i8816fprop_few_channels_s8.so tools/library/libcutlass_conv2d_sm75_s8_i8816fprop_fixed_channels_s8.so tools/library/libcutlass_conv2d_sm75_s8_i8816fprop_optimized_s8.so tools/library/libcutlass_conv2d_sm75_u4_i8832fprop_optimized_u4.so tools/library/libcutlass_conv2d_sm75_u8_i8816fprop_few_channels_u8.so tools/library/libcutlass_conv2d_sm75_u8_i8816fprop_fixed_channels_u8.so tools/library/libcutlass_conv2d_sm75_u8_i8816fprop_optimized_u8.so tools/library/libcutlass_conv2d_sm80_bf16_s16816dgrad_optimized_bf16.so tools/library/libcutlass_conv2d_sm80_bf16_s16816fprop_fixed_channels_bf16.so tools/library/libcutlass_conv2d_sm80_bf16_s16816fprop_optimized_bf16.so tools/library/libcutlass_conv2d_sm80_bf16_s16816wgrad_optimized_bf16.so tools/library/libcutlass_conv2d_sm80_f16_s16816dgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm80_f16_s16816fprop_fixed_channels_f16.so tools/library/libcutlass_conv2d_sm80_f16_s16816fprop_optimized_f16.so tools/library/libcutlass_conv2d_sm80_f16_s16816wgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm80_h16816dgrad_optimized.so tools/library/libcutlass_conv2d_sm80_h16816fprop_fixed_channels.so tools/library/libcutlass_conv2d_sm80_h16816fprop_optimized.so tools/library/libcutlass_conv2d_sm80_h16816wgrad_optimized.so tools/library/libcutlass_conv2d_sm80_i16832fprop_optimized_s8.so tools/library/libcutlass_conv2d_sm80_i16832fprop_optimized_u8.so tools/library/libcutlass_conv2d_sm80_i16864fprop_optimized_s4.so tools/library/libcutlass_conv2d_sm80_i16864fprop_optimized_u4.so tools/library/libcutlass_conv2d_sm80_s16816dgrad_optimized_bf16.so tools/library/libcutlass_conv2d_sm80_s16816dgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm80_s16816fprop_fixed_channels_bf16.so tools/library/libcutlass_conv2d_sm80_s16816fprop_fixed_channels_f16.so tools/library/libcutlass_conv2d_sm80_s16816fprop_optimized_bf16.so tools/library/libcutlass_conv2d_sm80_s16816fprop_optimized_f16.so tools/library/libcutlass_conv2d_sm80_s16816wgrad_optimized_bf16.so tools/library/libcutlass_conv2d_sm80_s16816wgrad_optimized_f16.so tools/library/libcutlass_conv2d_sm80_s1688bf16dgrad_optimized.so tools/library/libcutlass_conv2d_sm80_s1688bf16fprop_optimized.so tools/library/libcutlass_conv2d_sm80_s1688bf16wgrad_optimized.so tools/library/libcutlass_conv2d_sm80_s1688dgrad_optimized.so tools/library/libcutlass_conv2d_sm80_s1688dgrad_optimized_tf32.so tools/library/libcutlass_conv2d_sm80_s1688f16dgrad_optimized.so tools/library/libcutlass_conv2d_sm80_s1688f16fprop_optimized.so tools/library/libcutlass_conv2d_sm80_s1688f16wgrad_optimized.so tools/library/libcutlass_conv2d_sm80_s1688fprop_optimized.so tools/library/libcutlass_conv2d_sm80_s1688fprop_optimized_tf32.so tools/library/libcutlass_conv2d_sm80_s1688tf32dgrad_optimized.so tools/library/libcutlass_conv2d_sm80_s1688tf32fprop_optimized.so tools/library/libcutlass_conv2d_sm80_s1688tf32wgrad_optimized.so tools/library/libcutlass_conv2d_sm80_s1688wgrad_optimized.so tools/library/libcutlass_conv2d_sm80_s1688wgrad_optimized_tf32.so tools/library/libcutlass_conv2d_sm80_s4_i16864fprop_optimized_s4.so tools/library/libcutlass_conv2d_sm80_s8_i16832fprop_few_channels_s8.so tools/library/libcutlass_conv2d_sm80_s8_i16832fprop_fixed_channels_s8.so tools/library/libcutlass_conv2d_sm80_s8_i16832fprop_optimized_s8.so tools/library/libcutlass_conv2d_sm80_sdgrad_optimized.so tools/library/libcutlass_conv2d_sm80_sfprop_optimized.so tools/library/libcutlass_conv2d_sm80_swgrad_optimized.so tools/library/libcutlass_conv2d_sm80_tf32_s1688dgrad_optimized_tf32.so tools/library/libcutlass_conv2d_sm80_tf32_s1688fprop_optimized_tf32.so tools/library/libcutlass_conv2d_sm80_tf32_s1688wgrad_optimized_tf32.so tools/library/libcutlass_conv2d_sm80_u4_i16864fprop_optimized_u4.so tools/library/libcutlass_conv2d_sm80_u8_i16832fprop_few_channels_u8.so tools/library/libcutlass_conv2d_sm80_u8_i16832fprop_fixed_channels_u8.so tools/library/libcutlass_conv2d_sm80_u8_i16832fprop_optimized_u8.so tools/library/libcutlass_conv3d_sm80_bf16_s16816dgrad3d_analytic_bf16.so tools/library/libcutlass_conv3d_sm80_bf16_s16816dgrad3d_optimized_bf16.so tools/library/libcutlass_conv3d_sm80_bf16_s16816fprop3d_optimized_bf16.so tools/library/libcutlass_conv3d_sm80_bf16_s16816wgrad3d_optimized_bf16.so tools/library/libcutlass_conv3d_sm80_f16_s16816dgrad3d_analytic_f16.so tools/library/libcutlass_conv3d_sm80_f16_s16816dgrad3d_optimized_f16.so tools/library/libcutlass_conv3d_sm80_f16_s16816fprop3d_optimized_f16.so tools/library/libcutlass_conv3d_sm80_f16_s16816wgrad3d_optimized_f16.so tools/library/libcutlass_conv3d_sm80_h16816dgrad3d_analytic.so tools/library/libcutlass_conv3d_sm80_h16816dgrad3d_optimized.so tools/library/libcutlass_conv3d_sm80_h16816fprop3d_optimized.so tools/library/libcutlass_conv3d_sm80_h16816wgrad3d_optimized.so tools/library/libcutlass_conv3d_sm80_s16816dgrad3d_analytic_bf16.so tools/library/libcutlass_conv3d_sm80_s16816dgrad3d_analytic_f16.so tools/library/libcutlass_conv3d_sm80_s16816dgrad3d_optimized_bf16.so tools/library/libcutlass_conv3d_sm80_s16816dgrad3d_optimized_f16.so tools/library/libcutlass_conv3d_sm80_s16816fprop3d_optimized_bf16.so tools/library/libcutlass_conv3d_sm80_s16816fprop3d_optimized_f16.so tools/library/libcutlass_conv3d_sm80_s16816wgrad3d_optimized_bf16.so tools/library/libcutlass_conv3d_sm80_s16816wgrad3d_optimized_f16.so tools/library/libcutlass_rank_k_sm80_c1688herk.so tools/library/libcutlass_rank_k_sm80_c1688syrk.so tools/library/libcutlass_rank_k_sm80_c1688tf32herk.so tools/library/libcutlass_rank_k_sm80_c1688tf32syrk.so tools/library/libcutlass_rank_k_sm80_d884syrk.so tools/library/libcutlass_rank_k_sm80_gz884herk.so tools/library/libcutlass_rank_k_sm80_gz884syrk.so tools/library/libcutlass_rank_k_sm80_s1688syrk.so tools/library/libcutlass_rank_k_sm80_s1688tf32syrk.so tools/library/libcutlass_rank_k_sm80_z884herk.so tools/library/libcutlass_rank_k_sm80_z884syrk.so tools/library/libcutlass_rank_2k_sm80_c1688her2k.so tools/library/libcutlass_rank_2k_sm80_c1688syr2k.so tools/library/libcutlass_rank_2k_sm80_c1688tf32her2k.so tools/library/libcutlass_rank_2k_sm80_c1688tf32syr2k.so tools/library/libcutlass_rank_2k_sm80_d884syr2k.so tools/library/libcutlass_rank_2k_sm80_gz884her2k.so tools/library/libcutlass_rank_2k_sm80_gz884syr2k.so tools/library/libcutlass_rank_2k_sm80_s1688syr2k.so tools/library/libcutlass_rank_2k_sm80_s1688tf32syr2k.so tools/library/libcutlass_rank_2k_sm80_z884her2k.so tools/library/libcutlass_rank_2k_sm80_z884syr2k.so tools/library/libcutlass_trmm_sm80_c1688tf32trmm.so tools/library/libcutlass_trmm_sm80_c1688trmm.so tools/library/libcutlass_trmm_sm80_d884trmm.so tools/library/libcutlass_trmm_sm80_gz884trmm.so tools/library/libcutlass_trmm_sm80_s1688tf32trmm.so tools/library/libcutlass_trmm_sm80_s1688trmm.so tools/library/libcutlass_trmm_sm80_z884trmm.so tools/library/libcutlass_symm_sm80_c1688hemm.so tools/library/libcutlass_symm_sm80_c1688symm.so tools/library/libcutlass_symm_sm80_c1688tf32hemm.so tools/library/libcutlass_symm_sm80_c1688tf32symm.so tools/library/libcutlass_symm_sm80_d884symm.so tools/library/libcutlass_symm_sm80_gz884hemm.so tools/library/libcutlass_symm_sm80_gz884symm.so tools/library/libcutlass_symm_sm80_s1688symm.so tools/library/libcutlass_symm_sm80_s1688tf32symm.so tools/library/libcutlass_symm_sm80_z884hemm.so tools/library/libcutlass_symm_sm80_z884symm.so -Wl,-rpath,'' -Wl,-rpath,'/../lib64' -Wl,-rpath,'/../lib' -Wl,-rpath,'/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib64' -Wl,-rpath,'/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib' -ldl /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib/libcublas.so /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib/libcublasLt.so /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/lib64/stubs/libcuda.so -lcudadevrt -lcudart && : | |
[2841/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/54_hopper_fp8_warp_specialized_gemm/CMakeFiles/54_hopper_fp8_warp_specialized_gemm.dir/54_hopper_fp8_warp_specialized_gemm.cu.o -MF examples/54_hopper_fp8_warp_specialized_gemm/CMakeFiles/54_hopper_fp8_warp_specialized_gemm.dir/54_hopper_fp8_warp_specialized_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/54_hopper_fp8_warp_specialized_gemm/54_hopper_fp8_warp_specialized_gemm.cu -o examples/54_hopper_fp8_warp_specialized_gemm/CMakeFiles/54_hopper_fp8_warp_specialized_gemm.dir/54_hopper_fp8_warp_specialized_gemm.cu.o | |
[2842/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/51_hopper_gett/CMakeFiles/51_hopper_gett.dir/51_hopper_gett.cu.o -MF examples/51_hopper_gett/CMakeFiles/51_hopper_gett.dir/51_hopper_gett.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/51_hopper_gett/51_hopper_gett.cu -o examples/51_hopper_gett/CMakeFiles/51_hopper_gett.dir/51_hopper_gett.cu.o | |
[2843/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/49_hopper_gemm_with_collective_builder/CMakeFiles/49_collective_builder.dir/49_collective_builder.cu.o -MF examples/49_hopper_gemm_with_collective_builder/CMakeFiles/49_collective_builder.dir/49_collective_builder.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/49_hopper_gemm_with_collective_builder/49_collective_builder.cu -o examples/49_hopper_gemm_with_collective_builder/CMakeFiles/49_collective_builder.dir/49_collective_builder.cu.o | |
[2844/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_backward.dir/fused_multi_head_attention_backward.cu.o -MF examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_backward.dir/fused_multi_head_attention_backward.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/41_fused_multi_head_attention/fused_multi_head_attention_backward.cu -o examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_backward.dir/fused_multi_head_attention_backward.cu.o | |
[2845/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/55_hopper_mixed_dtype_gemm/CMakeFiles/55_hopper_mixed_dtype_gemm.dir/55_hopper_mixed_dtype_gemm.cu.o -MF examples/55_hopper_mixed_dtype_gemm/CMakeFiles/55_hopper_mixed_dtype_gemm.dir/55_hopper_mixed_dtype_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_mixed_dtype_gemm.cu -o examples/55_hopper_mixed_dtype_gemm/CMakeFiles/55_hopper_mixed_dtype_gemm.dir/55_hopper_mixed_dtype_gemm.cu.o | |
[2846/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/56_hopper_ptr_array_batched_gemm/CMakeFiles/56_hopper_ptr_array_batched_gemm.dir/56_hopper_ptr_array_batched_gemm.cu.o -MF examples/56_hopper_ptr_array_batched_gemm/CMakeFiles/56_hopper_ptr_array_batched_gemm.dir/56_hopper_ptr_array_batched_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/56_hopper_ptr_array_batched_gemm/56_hopper_ptr_array_batched_gemm.cu -o examples/56_hopper_ptr_array_batched_gemm/CMakeFiles/56_hopper_ptr_array_batched_gemm.dir/56_hopper_ptr_array_batched_gemm.cu.o | |
[2847/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/65_distributed_gemm/CMakeFiles/65_distributed_gemm.dir/65_distributed_gemm.cu.o -MF examples/65_distributed_gemm/CMakeFiles/65_distributed_gemm.dir/65_distributed_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/65_distributed_gemm/65_distributed_gemm.cu -o examples/65_distributed_gemm/CMakeFiles/65_distributed_gemm.dir/65_distributed_gemm.cu.o | |
FAILED: examples/65_distributed_gemm/CMakeFiles/65_distributed_gemm.dir/65_distributed_gemm.cu.o | |
/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/65_distributed_gemm/CMakeFiles/65_distributed_gemm.dir/65_distributed_gemm.cu.o -MF examples/65_distributed_gemm/CMakeFiles/65_distributed_gemm.dir/65_distributed_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/65_distributed_gemm/65_distributed_gemm.cu -o examples/65_distributed_gemm/CMakeFiles/65_distributed_gemm.dir/65_distributed_gemm.cu.o | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common/dist_gemm_helpers.h(63): error: identifier "__nanosleep" is undefined | |
__nanosleep(40); | |
^ | |
1 error detected in the compilation of "/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/65_distributed_gemm/65_distributed_gemm.cu". | |
[2848/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/57_hopper_grouped_gemm/CMakeFiles/57_hopper_grouped_gemm.dir/57_hopper_grouped_gemm.cu.o -MF examples/57_hopper_grouped_gemm/CMakeFiles/57_hopper_grouped_gemm.dir/57_hopper_grouped_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/57_hopper_grouped_gemm/57_hopper_grouped_gemm.cu -o examples/57_hopper_grouped_gemm/CMakeFiles/57_hopper_grouped_gemm.dir/57_hopper_grouped_gemm.cu.o | |
[2849/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/52_hopper_gather_scatter_fusion/CMakeFiles/52_hopper_gather_scatter_fusion.dir/52_hopper_gather_scatter_fusion.cu.o -MF examples/52_hopper_gather_scatter_fusion/CMakeFiles/52_hopper_gather_scatter_fusion.dir/52_hopper_gather_scatter_fusion.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/52_hopper_gather_scatter_fusion/52_hopper_gather_scatter_fusion.cu -o examples/52_hopper_gather_scatter_fusion/CMakeFiles/52_hopper_gather_scatter_fusion.dir/52_hopper_gather_scatter_fusion.cu.o | |
[2850/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/61_hopper_gemm_with_topk_and_softmax/CMakeFiles/61_hopper_gemm_with_topk_and_softmax.dir/61_hopper_gemm_with_topk_and_softmax.cu.o -MF examples/61_hopper_gemm_with_topk_and_softmax/CMakeFiles/61_hopper_gemm_with_topk_and_softmax.dir/61_hopper_gemm_with_topk_and_softmax.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/61_hopper_gemm_with_topk_and_softmax/61_hopper_gemm_with_topk_and_softmax.cu -o examples/61_hopper_gemm_with_topk_and_softmax/CMakeFiles/61_hopper_gemm_with_topk_and_softmax.dir/61_hopper_gemm_with_topk_and_softmax.cu.o | |
[2851/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/63_hopper_gemm_with_weight_prefetch/. -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/63_hopper_gemm_with_weight_prefetch/CMakeFiles/63_hopper_gemm_with_weight_prefetch.dir/63_hopper_gemm_with_weight_prefetch.cu.o -MF examples/63_hopper_gemm_with_weight_prefetch/CMakeFiles/63_hopper_gemm_with_weight_prefetch.dir/63_hopper_gemm_with_weight_prefetch.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/63_hopper_gemm_with_weight_prefetch/63_hopper_gemm_with_weight_prefetch.cu -o examples/63_hopper_gemm_with_weight_prefetch/CMakeFiles/63_hopper_gemm_with_weight_prefetch.dir/63_hopper_gemm_with_weight_prefetch.cu.o | |
[2852/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/53_hopper_gemm_permute/CMakeFiles/53_hopper_gemm_permute.dir/53_hopper_gemm_permute.cu.o -MF examples/53_hopper_gemm_permute/CMakeFiles/53_hopper_gemm_permute.dir/53_hopper_gemm_permute.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/53_hopper_gemm_permute/53_hopper_gemm_permute.cu -o examples/53_hopper_gemm_permute/CMakeFiles/53_hopper_gemm_permute.dir/53_hopper_gemm_permute.cu.o | |
[2853/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/64_ada_fp8_gemm_grouped/CMakeFiles/64_ada_fp8_gemm_grouped.dir/ada_fp8_gemm_grouped.cu.o -MF examples/64_ada_fp8_gemm_grouped/CMakeFiles/64_ada_fp8_gemm_grouped.dir/ada_fp8_gemm_grouped.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/64_ada_fp8_gemm_grouped/ada_fp8_gemm_grouped.cu -o examples/64_ada_fp8_gemm_grouped/CMakeFiles/64_ada_fp8_gemm_grouped.dir/ada_fp8_gemm_grouped.cu.o | |
[2854/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/CMakeFiles/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling.dir/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling.cu.o -MF examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/CMakeFiles/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling.dir/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling.cu -o examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/CMakeFiles/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling.dir/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling.cu.o | |
[2855/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/55_hopper_mixed_dtype_gemm/CMakeFiles/55_hopper_int4_fp8_gemm.dir/55_hopper_int4_fp8_gemm.cu.o -MF examples/55_hopper_mixed_dtype_gemm/CMakeFiles/55_hopper_int4_fp8_gemm.dir/55_hopper_int4_fp8_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_fp8_gemm.cu -o examples/55_hopper_mixed_dtype_gemm/CMakeFiles/55_hopper_int4_fp8_gemm.dir/55_hopper_int4_fp8_gemm.cu.o | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_fill.h(544): warning #1444-D: function "cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [with Bits=4, Signed=true, T=double, Enable=void]" was declared deprecated ("Implicit conversion is deprecated; please use explicit construction instead") | |
if (params.exclude_zero >=0 && result == Element(0.0)) { | |
^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h(78): note #3287-D: because of a "deprecated" attribute | |
[[deprecated("Implicit conversion is deprecated; please use explicit construction instead")]] | |
^ | |
detected during: | |
instantiation of "Element cutlass::reference::device::detail::RandomUniformFunc<Element>::operator()() [with Element=ElementB]" at line 149 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/kernel/tensor_foreach.h | |
instantiation of "void cutlass::reference::device::kernel::BlockForEach<Element,Func>(Element *, size_t, Func::Params) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 121 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_foreach.h | |
instantiation of "cutlass::reference::device::BlockForEach<Element, Func>::BlockForEach(Element *, size_t, Func::Params, int, int, cudaStream_t) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 820 | |
instantiation of "void cutlass::reference::device::BlockFillRandomUniform(Element *, size_t, uint64_t, cutlass::RealType<Element>::Type, cutlass::RealType<Element>::Type, int, double, cudaStream_t) [with Element=ElementB]" at line 206 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp | |
instantiation of "__nv_bool initialize_tensor(cutlass::DeviceAllocation<Element> &, uint64_t) [with Element=ElementB]" at line 350 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_fp8_gemm.cu | |
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_fill.h(544): warning #1444-D: function "cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [with Bits=4, Signed=true, T=double, Enable=void]" was declared deprecated ("Implicit conversion is deprecated; please use explicit construction instead") | |
if (params.exclude_zero >=0 && result == Element(0.0)) { | |
^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h(78): note #3287-D: because of a "deprecated" attribute | |
[[deprecated("Implicit conversion is deprecated; please use explicit construction instead")]] | |
^ | |
detected during: | |
instantiation of "Element cutlass::reference::device::detail::RandomUniformFunc<Element>::operator()() [with Element=ElementB]" at line 149 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/kernel/tensor_foreach.h | |
instantiation of "void cutlass::reference::device::kernel::BlockForEach<Element,Func>(Element *, size_t, Func::Params) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 121 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_foreach.h | |
instantiation of "cutlass::reference::device::BlockForEach<Element, Func>::BlockForEach(Element *, size_t, Func::Params, int, int, cudaStream_t) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 820 | |
instantiation of "void cutlass::reference::device::BlockFillRandomUniform(Element *, size_t, uint64_t, cutlass::RealType<Element>::Type, cutlass::RealType<Element>::Type, int, double, cudaStream_t) [with Element=ElementB]" at line 206 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp | |
instantiation of "__nv_bool initialize_tensor(cutlass::DeviceAllocation<Element> &, uint64_t) [with Element=ElementB]" at line 350 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_fp8_gemm.cu | |
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_fill.h(544): warning #1444-D: function "cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [with Bits=4, Signed=true, T=double, Enable=void]" was declared deprecated ("Implicit conversion is deprecated; please use explicit construction instead") | |
if (params.exclude_zero >=0 && result == Element(0.0)) { | |
^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h(78): note #3287-D: because of a "deprecated" attribute | |
[[deprecated("Implicit conversion is deprecated; please use explicit construction instead")]] | |
^ | |
detected during: | |
instantiation of "Element cutlass::reference::device::detail::RandomUniformFunc<Element>::operator()() [with Element=ElementB]" at line 149 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/kernel/tensor_foreach.h | |
instantiation of "void cutlass::reference::device::kernel::BlockForEach<Element,Func>(Element *, size_t, Func::Params) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 121 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_foreach.h | |
instantiation of "cutlass::reference::device::BlockForEach<Element, Func>::BlockForEach(Element *, size_t, Func::Params, int, int, cudaStream_t) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 820 | |
instantiation of "void cutlass::reference::device::BlockFillRandomUniform(Element *, size_t, uint64_t, cutlass::RealType<Element>::Type, cutlass::RealType<Element>::Type, int, double, cudaStream_t) [with Element=ElementB]" at line 206 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp | |
instantiation of "__nv_bool initialize_tensor(cutlass::DeviceAllocation<Element> &, uint64_t) [with Element=ElementB]" at line 350 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_fp8_gemm.cu | |
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp: In Instanziierung von »bool initialize_tensor(cutlass::DeviceAllocation<T>&, uint64_t) [mit Element = cutlass::integer_subbyte<4, true>; uint64_t = long unsigned int]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_fp8_gemm.cu:350:18: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp:205:80: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
205 | cutlass::reference::device::BlockFillRandomUniform( | |
| ^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp:205:100: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
205 | cutlass::reference::device::BlockFillRandomUniform( | |
| ^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
[2856/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/55_hopper_mixed_dtype_gemm/CMakeFiles/55_hopper_int4_bf16_gemm.dir/55_hopper_int4_bf16_gemm.cu.o -MF examples/55_hopper_mixed_dtype_gemm/CMakeFiles/55_hopper_int4_bf16_gemm.dir/55_hopper_int4_bf16_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_bf16_gemm.cu -o examples/55_hopper_mixed_dtype_gemm/CMakeFiles/55_hopper_int4_bf16_gemm.dir/55_hopper_int4_bf16_gemm.cu.o | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_fill.h(544): warning #1444-D: function "cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [with Bits=4, Signed=true, T=double, Enable=void]" was declared deprecated ("Implicit conversion is deprecated; please use explicit construction instead") | |
if (params.exclude_zero >=0 && result == Element(0.0)) { | |
^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h(78): note #3287-D: because of a "deprecated" attribute | |
[[deprecated("Implicit conversion is deprecated; please use explicit construction instead")]] | |
^ | |
detected during: | |
instantiation of "Element cutlass::reference::device::detail::RandomUniformFunc<Element>::operator()() [with Element=ElementB]" at line 149 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/kernel/tensor_foreach.h | |
instantiation of "void cutlass::reference::device::kernel::BlockForEach<Element,Func>(Element *, size_t, Func::Params) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 121 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_foreach.h | |
instantiation of "cutlass::reference::device::BlockForEach<Element, Func>::BlockForEach(Element *, size_t, Func::Params, int, int, cudaStream_t) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 820 | |
instantiation of "void cutlass::reference::device::BlockFillRandomUniform(Element *, size_t, uint64_t, cutlass::RealType<Element>::Type, cutlass::RealType<Element>::Type, int, double, cudaStream_t) [with Element=ElementB]" at line 206 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp | |
instantiation of "__nv_bool initialize_tensor(cutlass::DeviceAllocation<Element> &, uint64_t) [with Element=ElementB]" at line 432 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_bf16_gemm.cu | |
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_fill.h(544): warning #1444-D: function "cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [with Bits=4, Signed=true, T=double, Enable=void]" was declared deprecated ("Implicit conversion is deprecated; please use explicit construction instead") | |
if (params.exclude_zero >=0 && result == Element(0.0)) { | |
^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h(78): note #3287-D: because of a "deprecated" attribute | |
[[deprecated("Implicit conversion is deprecated; please use explicit construction instead")]] | |
^ | |
detected during: | |
instantiation of "Element cutlass::reference::device::detail::RandomUniformFunc<Element>::operator()() [with Element=ElementB]" at line 149 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/kernel/tensor_foreach.h | |
instantiation of "void cutlass::reference::device::kernel::BlockForEach<Element,Func>(Element *, size_t, Func::Params) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 121 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_foreach.h | |
instantiation of "cutlass::reference::device::BlockForEach<Element, Func>::BlockForEach(Element *, size_t, Func::Params, int, int, cudaStream_t) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 820 | |
instantiation of "void cutlass::reference::device::BlockFillRandomUniform(Element *, size_t, uint64_t, cutlass::RealType<Element>::Type, cutlass::RealType<Element>::Type, int, double, cudaStream_t) [with Element=ElementB]" at line 206 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp | |
instantiation of "__nv_bool initialize_tensor(cutlass::DeviceAllocation<Element> &, uint64_t) [with Element=ElementB]" at line 432 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_bf16_gemm.cu | |
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_fill.h(544): warning #1444-D: function "cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [with Bits=4, Signed=true, T=double, Enable=void]" was declared deprecated ("Implicit conversion is deprecated; please use explicit construction instead") | |
if (params.exclude_zero >=0 && result == Element(0.0)) { | |
^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h(78): note #3287-D: because of a "deprecated" attribute | |
[[deprecated("Implicit conversion is deprecated; please use explicit construction instead")]] | |
^ | |
detected during: | |
instantiation of "Element cutlass::reference::device::detail::RandomUniformFunc<Element>::operator()() [with Element=ElementB]" at line 149 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/kernel/tensor_foreach.h | |
instantiation of "void cutlass::reference::device::kernel::BlockForEach<Element,Func>(Element *, size_t, Func::Params) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 121 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include/cutlass/util/reference/device/tensor_foreach.h | |
instantiation of "cutlass::reference::device::BlockForEach<Element, Func>::BlockForEach(Element *, size_t, Func::Params, int, int, cudaStream_t) [with Element=ElementB, Func=cutlass::reference::device::detail::RandomUniformFunc<ElementB>]" at line 820 | |
instantiation of "void cutlass::reference::device::BlockFillRandomUniform(Element *, size_t, uint64_t, cutlass::RealType<Element>::Type, cutlass::RealType<Element>::Type, int, double, cudaStream_t) [with Element=ElementB]" at line 206 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp | |
instantiation of "__nv_bool initialize_tensor(cutlass::DeviceAllocation<Element> &, uint64_t) [with Element=ElementB]" at line 432 of /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_bf16_gemm.cu | |
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp: In Instanziierung von »bool initialize_tensor(cutlass::DeviceAllocation<T>&, uint64_t) [mit Element = cutlass::integer_subbyte<4, true>; uint64_t = long unsigned int]«: | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_bf16_gemm.cu:432:18: von hier erfordert | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp:205:80: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
205 | cutlass::reference::device::BlockFillRandomUniform( | |
| ^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/55_hopper_mixed_dtype_gemm/mixed_dtype_utils.hpp:205:100: Warnung: »cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(T) [mit T = double; Enable = void; int Bits = 4; bool Signed = true]« ist veraltet: Implicit conversion is deprecated; please use explicit construction instead [-Wdeprecated-declarations] | |
205 | cutlass::reference::device::BlockFillRandomUniform( | |
| ^ | |
/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include/cutlass/integer_subbyte.h:81:1: Anmerkung: hier deklariert | |
81 | integer_subbyte(T value) | |
| ^ ~~~~~~~~~~~~~ | |
[2857/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/CMakeFiles/67_hopper_fp8_warp_specialized_gemm_with_groupwise_scaling.dir/67_hopper_fp8_warp_specialized_gemm_with_groupwise_scaling.cu.o -MF examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/CMakeFiles/67_hopper_fp8_warp_specialized_gemm_with_groupwise_scaling.dir/67_hopper_fp8_warp_specialized_gemm_with_groupwise_scaling.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/67_hopper_fp8_warp_specialized_gemm_with_groupwise_scaling.cu -o examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/CMakeFiles/67_hopper_fp8_warp_specialized_gemm_with_groupwise_scaling.dir/67_hopper_fp8_warp_specialized_gemm_with_groupwise_scaling.cu.o | |
[2858/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling/CMakeFiles/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling.dir/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling.cu.o -MF examples/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling/CMakeFiles/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling.dir/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling.cu -o examples/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling/CMakeFiles/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling.dir/68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling.cu.o | |
[2859/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/62_hopper_sparse_gemm/CMakeFiles/62_hopper_sparse_gemm.dir/62_hopper_sparse_gemm.cu.o -MF examples/62_hopper_sparse_gemm/CMakeFiles/62_hopper_sparse_gemm.dir/62_hopper_sparse_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/62_hopper_sparse_gemm/62_hopper_sparse_gemm.cu -o examples/62_hopper_sparse_gemm/CMakeFiles/62_hopper_sparse_gemm.dir/62_hopper_sparse_gemm.cu.o | |
[2860/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/59_ampere_gather_scatter_conv/CMakeFiles/59_ampere_gather_scatter_conv.dir/ampere_gather_scatter_conv.cu.o -MF examples/59_ampere_gather_scatter_conv/CMakeFiles/59_ampere_gather_scatter_conv.dir/ampere_gather_scatter_conv.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/59_ampere_gather_scatter_conv/ampere_gather_scatter_conv.cu -o examples/59_ampere_gather_scatter_conv/CMakeFiles/59_ampere_gather_scatter_conv.dir/ampere_gather_scatter_conv.cu.o | |
[2861/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/58_ada_fp8_gemm/CMakeFiles/58_ada_fp8_gemm.dir/ada_fp8_gemm.cu.o -MF examples/58_ada_fp8_gemm/CMakeFiles/58_ada_fp8_gemm.dir/ada_fp8_gemm.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/58_ada_fp8_gemm/ada_fp8_gemm.cu -o examples/58_ada_fp8_gemm/CMakeFiles/58_ada_fp8_gemm.dir/ada_fp8_gemm.cu.o | |
[2862/3834] /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/GCCcore/13.3.0/bin/g++ -DCUTLASS_ENABLE_CUBLAS=1 -DCUTLASS_ENABLE_CUDNN=1 -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/common -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/easybuild_obj/include -I/dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/tools/util/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include -isystem /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/CUDA/12.6.0/include/cccl -DCUTLASS_VERSIONS_GENERATED -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_80,code=[compute_80]" "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_70,code=[compute_70]" "--generate-code=arch=compute_61,code=[sm_61]" "--generate-code=arch=compute_61,code=[compute_61]" -Xcompiler=-fPIE -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_ENABLE_GDC_FOR_SM100=1 --expt-relaxed-constexpr -ftemplate-backtrace-limit=0 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -Xcompiler=-Wconversion -Xcompiler=-fno-strict-aliasing -MD -MT examples/39_gemm_permute/CMakeFiles/39_gemm_permute.dir/gemm_permute.cu.o -MF examples/39_gemm_permute/CMakeFiles/39_gemm_permute.dir/gemm_permute.cu.o.d -x cu -c /dev/shm/s3248973-EasyBuild18826184/CUTLASS/4.1.0/foss-2024a-CUDA-12.6.0/cutlass-4.1.0/examples/39_gemm_permute/gemm_permute.cu -o examples/39_gemm_permute/CMakeFiles/39_gemm_permute.dir/gemm_permute.cu.o | |
ninja: build stopped: subcommand failed. | |
== 2025-08-08 14:45:48,559 build_log.py:226 ERROR EasyBuild encountered an error (at easybuild/base/exceptions.py:126 in __init__): shell command 'ninja ...' failed with exit code 1 in build step for CUTLASS-4.1.0-foss-2024a-CUDA-12.6.0.eb (at easybuild/framework/easyblock.py:4859 in run_all_steps) | |
== 2025-08-08 14:45:48,559 build_log.py:322 INFO ... (took 31 mins 57 secs) | |
== 2025-08-08 14:45:48,559 filetools.py:2135 INFO Removing lock /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/.locks/_data_horse_ws_s3248973-EasyBuild_easybuild-rapids_software_CUTLASS_4.1.0-foss-2024a-CUDA-12.6.0.lock... | |
== 2025-08-08 14:45:48,563 filetools.py:403 INFO Path /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/.locks/_data_horse_ws_s3248973-EasyBuild_easybuild-rapids_software_CUTLASS_4.1.0-foss-2024a-CUDA-12.6.0.lock successfully removed. | |
== 2025-08-08 14:45:48,563 filetools.py:2139 INFO Lock removed: /data/horse/ws/s3248973-EasyBuild/easybuild-rapids/software/.locks/_data_horse_ws_s3248973-EasyBuild_easybuild-rapids_software_CUTLASS_4.1.0-foss-2024a-CUDA-12.6.0.lock | |
== 2025-08-08 14:45:48,563 easyblock.py:387 INFO Closing log for application name CUTLASS version 4.1.0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment