Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save VRehnberg/32a390300513608f40de26f1ae404c1e to your computer and use it in GitHub Desktop.
Save VRehnberg/32a390300513608f40de26f1ae404c1e to your computer and use it in GitHub Desktop.
(partial) EasyBuild log for failed build of /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-c_4tmg_1/files_pr21438/d/DeepSpeed/DeepSpeed-0.14.5-foss-2023a-CUDA-12.1.1.eb (PR(s) #21438) (easyblock PR(s) #3450)
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:1075: warning: T* at::Tensor::data() const [with T = c10::Half] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:1177: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:1203: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:163: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:189: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:339: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:390: warning: T* at::Tensor::data() const [with T = c10::Half] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:422: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:443: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:468: warning: T* at::Tensor::data() const [with T = c10::Half] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:592: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:618: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:379:648: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
379 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu: In lambda function:
csrc/lamb/fused_lamb_cuda_kernel.cu:428:1074: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:1104: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:1126: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:1148: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:1251: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:1278: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:162: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:189: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:338: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:368: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:390: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:412: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:537: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:564: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:595: warning: T* at::Tensor::data() const [with T = double] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu: In lambda function:
csrc/lamb/fused_lamb_cuda_kernel.cu:428:893: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:922: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:943: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:964: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:1066: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:1092: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:159: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:185: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:331: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:360: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:381: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:402: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:526: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:552: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
csrc/lamb/fused_lamb_cuda_kernel.cu:428:582: warning: T* at::Tensor::data() const [with T = float] is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
428 | AT_DISPATCH_FLOATING_TYPES(
| ^
/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
247 | T * data() const {
| ^ ~~
creating build/lib.linux-x86_64-cpython-311/deepspeed/ops/lamb
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/lamb/fused_lamb_cuda.o build/temp.linux-x86_64-cpython-311/csrc/lamb/fused_lamb_cuda_kernel.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/lamb/fused_lamb_op.cpython-311-x86_64-linux-gnu.so
creating build/lib.linux-x86_64-cpython-311/deepspeed/ops/lion
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/lion/cpu_lion.o build/temp.linux-x86_64-cpython-311/csrc/lion/cpu_lion_impl.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/lion/cpu_lion_op.cpython-311-x86_64-linux-gnu.so -lcurand -L/apps/Common/software/CUDA/12.1.1/lib64
creating build/lib.linux-x86_64-cpython-311/deepspeed/ops/adam
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam_impl.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/adam/cpu_adam_op.cpython-311-x86_64-linux-gnu.so -lcurand -L/apps/Common/software/CUDA/12.1.1/lib64
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/lion/fused_lion_frontend.o build/temp.linux-x86_64-cpython-311/csrc/lion/multi_tensor_lion.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/lion/fused_lion_op.cpython-311-x86_64-linux-gnu.so
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/adam/fused_adam_frontend.o build/temp.linux-x86_64-cpython-311/csrc/adam/multi_tensor_adam.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/adam/fused_adam_op.cpython-311-x86_64-linux-gnu.so
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm/layer_norm.cpp -o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm/layer_norm.o -fPIC -O3 -std=c++17 -g -Wno-reorder -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -D_GLIBCXX_USE_CXX11_ABI=1
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/transformer/inference/includes -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c csrc/transformer/inference/csrc/transform.cu -o build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/transform.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=transformer_inference_op -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
csrc/transformer/inference/csrc/transform.cu(38): warning #177-D: variable "d0_stride" was declared but never referenced
int d0_stride = hidden_dim * seq_length;
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
csrc/transformer/inference/csrc/transform.cu(66): warning #177-D: variable "lane" was declared but never referenced
int lane = d3 & 0x1f;
^
csrc/transformer/inference/csrc/transform.cu(109): warning #177-D: variable "half_dim" was declared but never referenced
unsigned half_dim = (rotary_dim << 3) >> 1;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int, float) [with T=__nv_bfloat16]" at line 281
csrc/transformer/inference/csrc/transform.cu(110): warning #177-D: variable "d0_stride" was declared but never referenced
int d0_stride = hidden_dim * seq_length;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int, float) [with T=__nv_bfloat16]" at line 281
csrc/transformer/inference/csrc/transform.cu(126): warning #177-D: variable "vals_half" was declared but never referenced
T2* vals_half = reinterpret_cast<T2*>(&vals_arr);
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int, float) [with T=__nv_bfloat16]" at line 281
csrc/transformer/inference/csrc/transform.cu(127): warning #177-D: variable "output_half" was declared but never referenced
T2* output_half = reinterpret_cast<T2*>(&output_arr);
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int, float) [with T=__nv_bfloat16]" at line 281
csrc/transformer/inference/csrc/transform.cu(144): warning #177-D: variable "lane" was declared but never referenced
int lane = d3 & 0x1f;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int, float) [with T=__nv_bfloat16]" at line 281
csrc/transformer/inference/csrc/transform.cu(38): warning #177-D: variable "d0_stride" was declared but never referenced
int d0_stride = hidden_dim * seq_length;
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
csrc/transformer/inference/csrc/transform.cu(66): warning #177-D: variable "lane" was declared but never referenced
int lane = d3 & 0x1f;
^
csrc/transformer/inference/csrc/transform.cu(109): warning #177-D: variable "half_dim" was declared but never referenced
unsigned half_dim = (rotary_dim << 3) >> 1;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int, float) [with T=__nv_bfloat16]" at line 281
csrc/transformer/inference/csrc/transform.cu(110): warning #177-D: variable "d0_stride" was declared but never referenced
int d0_stride = hidden_dim * seq_length;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int, float) [with T=__nv_bfloat16]" at line 281
csrc/transformer/inference/csrc/transform.cu(126): warning #177-D: variable "vals_half" was declared but never referenced
T2* vals_half = reinterpret_cast<T2*>(&vals_arr);
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int, float) [with T=__nv_bfloat16]" at line 281
csrc/transformer/inference/csrc/transform.cu(127): warning #177-D: variable "output_half" was declared but never referenced
T2* output_half = reinterpret_cast<T2*>(&output_arr);
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int, float) [with T=__nv_bfloat16]" at line 281
csrc/transformer/inference/csrc/transform.cu(144): warning #177-D: variable "lane" was declared but never referenced
int lane = d3 & 0x1f;
^
detected during instantiation of "void launch_bias_add_transform_0213(T *, T *, T *, const T *, const T *, int, int, unsigned int, int, int, int, int, int, __nv_bool, __nv_bool, cudaStream_t, int, int, float) [with T=__nv_bfloat16]" at line 281
gcc -DNDEBUG -g -fwrapv -O3 -Wall -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas -fPIC -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/aio/py_lib -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/aio/common -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c csrc/aio/py_lib/deepspeed_py_aio_handle.cpp -o build/temp.linux-x86_64-cpython-311/csrc/aio/py_lib/deepspeed_py_aio_handle.o -g -Wall -O0 -std=c++17 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX512__ -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=1
creating build/lib.linux-x86_64-cpython-311/deepspeed/ops/transformer
creating build/lib.linux-x86_64-cpython-311/deepspeed/ops/transformer/inference
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/apply_rotary_pos_emb.o build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/dequantize.o build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/gelu.o build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/layer_norm.o build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/pointwise_ops.o build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/pt_binding.o build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/relu.o build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/rms_norm.o build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/softmax.o build/temp.linux-x86_64-cpython-311/csrc/transformer/inference/csrc/transform.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/transformer/inference/transformer_inference_op.cpython-311-x86_64-linux-gnu.so -lcurand
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c csrc/transformer/transform_kernels.cu -o build/temp.linux-x86_64-cpython-311/csrc/transformer/transform_kernels.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -D__STOCHASTIC_MODE__ -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=stochastic_transformer_op -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c csrc/transformer/transform_kernels.cu -o build/temp.linux-x86_64-cpython-311/csrc/transformer/transform_kernels.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=transformer_op -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c csrc/quantization/quantize.cu -o build/temp.linux-x86_64-cpython-311/csrc/quantization/quantize.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=quantizer_op -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes/reduction_utils.h(787): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/transformer/cublas_wrappers.o build/temp.linux-x86_64-cpython-311/csrc/transformer/dropout_kernels.o build/temp.linux-x86_64-cpython-311/csrc/transformer/ds_transformer_cuda.o build/temp.linux-x86_64-cpython-311/csrc/transformer/gelu_kernels.o build/temp.linux-x86_64-cpython-311/csrc/transformer/general_kernels.o build/temp.linux-x86_64-cpython-311/csrc/transformer/normalize_kernels.o build/temp.linux-x86_64-cpython-311/csrc/transformer/softmax_kernels.o build/temp.linux-x86_64-cpython-311/csrc/transformer/transform_kernels.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/transformer/stochastic_transformer_op.cpython-311-x86_64-linux-gnu.so -lcurand
build/temp.linux-x86_64-cpython-311/csrc/transformer/dropout_kernels.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/transformer/cublas_wrappers.o build/temp.linux-x86_64-cpython-311/csrc/transformer/dropout_kernels.o build/temp.linux-x86_64-cpython-311/csrc/transformer/ds_transformer_cuda.o build/temp.linux-x86_64-cpython-311/csrc/transformer/gelu_kernels.o build/temp.linux-x86_64-cpython-311/csrc/transformer/general_kernels.o build/temp.linux-x86_64-cpython-311/csrc/transformer/normalize_kernels.o build/temp.linux-x86_64-cpython-311/csrc/transformer/softmax_kernels.o build/temp.linux-x86_64-cpython-311/csrc/transformer/transform_kernels.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/transformer/transformer_op.cpython-311-x86_64-linux-gnu.so -lcurand
build/temp.linux-x86_64-cpython-311/csrc/transformer/dropout_kernels.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes/reduction_utils.h(787): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes/reduction_utils.h(787): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c csrc/quantization/quantize_intX.cu -o build/temp.linux-x86_64-cpython-311/csrc/quantization/quantize_intX.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=quantizer_op -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c csrc/quantization/swizzled_quantize.cu -o build/temp.linux-x86_64-cpython-311/csrc/quantization/swizzled_quantize.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=quantizer_op -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes/reduction_utils.h(787): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes/reduction_utils.h(787): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/includes/reduction_utils.h(787): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
gcc -DNDEBUG -g -fwrapv -O3 -Wall -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas -fPIC -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/aio/py_lib -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/aio/common -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c csrc/aio/py_lib/deepspeed_py_copy.cpp -o build/temp.linux-x86_64-cpython-311/csrc/aio/py_lib/deepspeed_py_copy.o -g -Wall -O0 -std=c++17 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX512__ -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=1
creating build/lib.linux-x86_64-cpython-311/deepspeed/ops/quantizer
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/quantization/dequantize.o build/temp.linux-x86_64-cpython-311/csrc/quantization/fake_quantizer.o build/temp.linux-x86_64-cpython-311/csrc/quantization/pt_binding.o build/temp.linux-x86_64-cpython-311/csrc/quantization/quant_reduce.o build/temp.linux-x86_64-cpython-311/csrc/quantization/quantize.o build/temp.linux-x86_64-cpython-311/csrc/quantization/quantize_intX.o build/temp.linux-x86_64-cpython-311/csrc/quantization/swizzled_quantize.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/quantizer/quantizer_op.cpython-311-x86_64-linux-gnu.so -lcurand
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm/layer_norm_cuda.cu -o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm/layer_norm_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes/reduction_utils.h(739): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes/reduction_utils.h(739): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes/reduction_utils.h(739): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c deepspeed/inference/v2/kernels/core_ops/cuda_linear/linear_kernels.cpp -o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_linear/linear_kernels.o -fPIC -O3 -std=c++17 -g -Wno-reorder -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -D_GLIBCXX_USE_CXX11_ABI=1
gcc -DNDEBUG -g -fwrapv -O3 -Wall -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas -fPIC -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/aio/py_lib -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/csrc/aio/common -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c csrc/aio/py_lib/py_ds_aio.cpp -o build/temp.linux-x86_64-cpython-311/csrc/aio/py_lib/py_ds_aio.o -g -Wall -O0 -std=c++17 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX512__ -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=1
In file included from /apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/Exceptions.h:14,
from /apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/python.h:11,
from /apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/extension.h:9,
from csrc/aio/py_lib/py_ds_aio.cpp:10:
/apps/Test/software/pybind11/2.11.1-GCCcore-12.3.0/include/pybind11/pybind11.h: In instantiation of class pybind11::class_<deepspeed_aio_handle_t>:
csrc/aio/py_lib/py_ds_aio.cpp:22:55: required from here
/apps/Test/software/pybind11/2.11.1-GCCcore-12.3.0/include/pybind11/pybind11.h:1496:7: warning: pybind11::class_<deepspeed_aio_handle_t> declared with greater visibility than its base pybind11::detail::generic_type [-Wattributes]
1496 | class class_ : public detail::generic_type {
| ^~~~~~
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c deepspeed/inference/v2/kernels/core_ops/cuda_linear/linear_kernels_cuda.cu -o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_linear/linear_kernels_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
creating build/lib.linux-x86_64-cpython-311/deepspeed/ops/aio
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/aio/common/deepspeed_aio_common.o build/temp.linux-x86_64-cpython-311/csrc/aio/common/deepspeed_aio_types.o build/temp.linux-x86_64-cpython-311/csrc/aio/common/deepspeed_aio_utils.o build/temp.linux-x86_64-cpython-311/csrc/aio/py_lib/deepspeed_aio_thread.o build/temp.linux-x86_64-cpython-311/csrc/aio/py_lib/deepspeed_pin_tensor.o build/temp.linux-x86_64-cpython-311/csrc/aio/py_lib/deepspeed_py_aio.o build/temp.linux-x86_64-cpython-311/csrc/aio/py_lib/deepspeed_py_aio_handle.o build/temp.linux-x86_64-cpython-311/csrc/aio/py_lib/deepspeed_py_copy.o build/temp.linux-x86_64-cpython-311/csrc/aio/py_lib/py_ds_aio.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/aio/async_io_op.cpython-311-x86_64-linux-gnu.so -laio
deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/ptx_mma.cuh(59): warning #174-D: expression has no effect
("The matrix load functions are only supported on Ampere and newer architectures", false)
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/ptx_mma.cuh(135): warning #174-D: expression has no effect
("The mma functions are only implemented for Ampere and newer architectures", false)
^
deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/ptx_cp.async.cuh(33): warning #174-D: expression has no effect
("The async copy functions are only supported on Ampere and newer architectures", false)
^
deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/ptx_cp.async.cuh(44): warning #174-D: expression has no effect
("The async copy functions are only supported on Ampere and newer architectures", false)
^
deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/ptx_cp.async.cuh(56): warning #174-D: expression has no effect
("The async copy functions are only supported on Ampere and newer architectures", false)
^
deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/ptx_cp.async.cuh(70): warning #174-D: expression has no effect
("The async copy functions are only supported on Ampere and newer architectures", false)
^
deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/kernel_matmul.cuh(268): warning #174-D: expression has no effect
("The FP6 functions are only available on Ampere GPUs.", false)
^
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm/rms_norm.cpp -o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm/rms_norm.o -fPIC -O3 -std=c++17 -g -Wno-reorder -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -D_GLIBCXX_USE_CXX11_ABI=1
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm/rms_norm_cuda.cu -o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm/rms_norm_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes/reduction_utils.h(739): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes/reduction_utils.h(739): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes/reduction_utils.h(739): warning #1866-D: attribute does not apply to any entity
__attribute__((aligned(8))) struct IdxReduceResult {
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.cpp -o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.o -fPIC -O3 -std=c++17 -g -Wno-reorder -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -D_GLIBCXX_USE_CXX11_ABI=1
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1/DeepSpeed/DeepSpeed-0.14.5/deepspeed/inference/v2/kernels/includes -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels_cuda.cu -o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation.o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_cuda.o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/core_ops.o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm/layer_norm.o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm/layer_norm_cuda.o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_linear/linear_kernels.o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_linear/linear_kernels_cuda.o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm/rms_norm.o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm/rms_norm_cuda.o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.o build/temp.linux-x86_64-cpython-311/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels_cuda.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/inference/v2/kernelsinference_core_ops.cpython-311-x86_64-linux-gnu.so
/apps/Common/software/CUDA/12.1.1/bin/nvcc -I/apps/Test/software/CUTLASS/3.4.0-foss-2023a-CUDA-12.1.1/include -I/apps/Test/software/CUTLASS/3.4.0-foss-2023a-CUDA-12.1.1/tools/util/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/apps/Common/software/CUDA/12.1.1/include -I/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c csrc/deepspeed4science/evoformer_attn/attention_cu.cu -o build/temp.linux-x86_64-cpython-311/csrc/deepspeed4science/evoformer_attn/attention_cu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -Xcompiler -fPIC -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ --threads=8 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -DGPU_ARCH=80 -DBF16_AVAILABLE -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1017\" -DTORCH_EXTENSION_NAME=evoformer_attn_op -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin gcc
g++ -shared -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/OpenSSL/1.1/lib64 -L/apps/Test/software/OpenSSL/1.1/lib -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/libffi/3.4.4-GCCcore-12.3.0/lib -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/XZ/5.4.2-GCCcore-12.3.0/lib -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib64 -L/apps/Test/software/SQLite/3.42.0-GCCcore-12.3.0/lib -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib64 -L/apps/Test/software/ncurses/6.4-GCCcore-12.3.0/lib -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib64 -L/apps/Test/software/libreadline/8.2-GCCcore-12.3.0/lib -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib64 -L/apps/Test/software/zlib/1.2.13-GCCcore-12.3.0/lib -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib64 -L/apps/Test/software/bzip2/1.0.8-GCCcore-12.3.0/lib -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib64 -L/apps/Test/software/binutils/2.40-GCCcore-12.3.0/lib -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64 -L/apps/Test/software/pkgconf/1.9.5-GCCcore-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib64 -L/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/lib -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib64 -L/apps/Test/software/ScaLAPACK/2.2.0-gompi-2023a-fb/lib -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib64 -L/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/lib -L/apps/Test/software/GCCcore/12.3.0/lib64 -L/apps/Test/software/GCCcore/12.3.0/lib -O2 -ftree-vectorize -march=native -fno-math-errno -I/apps/Test/software/FFTW/3.3.10-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/apps/Test/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas build/temp.linux-x86_64-cpython-311/csrc/deepspeed4science/evoformer_attn/attention.o build/temp.linux-x86_64-cpython-311/csrc/deepspeed4science/evoformer_attn/attention_back.o build/temp.linux-x86_64-cpython-311/csrc/deepspeed4science/evoformer_attn/attention_cu.o -L/apps/Test/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/lib -L/apps/Common/software/CUDA/12.1.1/lib64 -L/apps/Test/software/Python/3.11.3-GCCcore-12.3.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/evoformer_attn_op.cpython-311-x86_64-linux-gnu.so -lcurand
error: command '/apps/Test/software/GCCcore/12.3.0/bin/g++' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for deepspeed
Running setup.py clean for deepspeed
Failed to build deepspeed
ERROR: Could not build wheels for deepspeed, which is required to install pyproject.toml-based projects
(at easybuild/tools/run.py:695 in parse_cmd_output)
== 2024-11-12 13:12:22,631 build_log.py:267 INFO ... (took 9 mins 1 secs)
== 2024-11-12 13:12:22,631 build_log.py:267 INFO ... (took 9 mins 25 secs)
== 2024-11-12 13:12:22,631 filetools.py:2025 INFO Removing lock /apps/Test/software/.locks/_apps_Test_software_DeepSpeed_0.14.5-foss-2023a-CUDA-12.1.1.lock...
== 2024-11-12 13:12:22,636 filetools.py:385 INFO Path /apps/Test/software/.locks/_apps_Test_software_DeepSpeed_0.14.5-foss-2023a-CUDA-12.1.1.lock successfully removed.
== 2024-11-12 13:12:22,636 filetools.py:2029 INFO Lock removed: /apps/Test/software/.locks/_apps_Test_software_DeepSpeed_0.14.5-foss-2023a-CUDA-12.1.1.lock
== 2024-11-12 13:12:22,636 easyblock.py:4297 WARNING build failed (first 300 chars): cmd "export PATH=/cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-nk79msr5/tmpdacggzek/bin:$PATH PYTHONPATH=/cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-nk79msr5/tmpdacggzek/lib/python3.11/site-packages:$PYTHONPATH LD_LIBRARY_PATH=/cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-nk79msr5/tmpdacggzek/lib/python3.11/site
== 2024-11-12 13:12:22,636 easyblock.py:326 INFO Closing log for application name DeepSpeed version 0.14.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment