This sample demonstrates how to add support for multiple NVIDIA architectures (Compute Capabilities, CC) into a single OpenMP target offload C program with Clang.
In this example the executable is compiled to support sm_80
(e.g. NVIDIA A100) and sm_90
(e.g. NVIDIA H100):
> make
LIBRARY_PATH=/usr/lib/llvm-19/lib clang-19 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda \
-Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_90 \
-Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_80 \
-fuse-ld=lld multi_sm_test.c -o multi_sm_test
> strings ./multi_sm_test | grep "sm_80"
sm_80
> strings ./multi_sm_test | grep "sm_90"
sm_90
The latest Clang 19 is used above, however the same works for Clang 17.
This feature is discussed in D128090, noting that -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_80
works, while -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80
does not.