In general, check the crt/host_config.h
file to find out which versions are supported.
Sometimes it is possible to hack the requirements there to get some newer versions working, too :)
Thrust version can be found in $CUDA_ROOT/include/thrust/version.h
.
Download Archives: https://developer.nvidia.com/cuda-toolkit-archive
Release notes for CUDA Toolkit (CTK):
- latest: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
- archive: https://docs.nvidia.com/cuda/archive/
- 11.8.0: https://docs.nvidia.com/cuda/archive/11.8.0/
- 11.7.2: https://docs.nvidia.com/cuda/archive/11.7.1
- ...
- 11.5.1: https://docs.nvidia.com/cuda/archive/11.5.1/
- 11.4.2: https://docs.nvidia.com/cuda/archive/11.4.2/
- 11.4.1: https://docs.nvidia.com/cuda/archive/11.4.1/
- 11.4.0: https://docs.nvidia.com/cuda/archive/11.4.0/
- 11.3: https://docs.nvidia.com/cuda/archive/11.3.0/index.html
- 11.2: https://docs.nvidia.com/cuda/archive/11.2.2/index.html
- 11.1: https://docs.nvidia.com/cuda/archive/11.1.1/index.html
- 11.0: https://docs.nvidia.com/cuda/archive/11.0/cuda-toolkit-release-notes/index.html
- 10.2: https://developer.download.nvidia.com/compute/cuda/10.2/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 10.1: https://developer.download.nvidia.com/compute/cuda/10.1/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 10.0: https://developer.download.nvidia.com/compute/cuda/10.0/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 9.2: https://developer.download.nvidia.com/compute/cuda/9.2/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 9.1: https://developer.download.nvidia.com/compute/cuda/9.1/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 9.0: https://developer.download.nvidia.com/compute/cuda/9.0/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 8.0: https://developer.nvidia.com/compute/cuda/8.0/Prod2/docs/sidebar/CUDA_Toolkit_Release_Notes-pdf
- 7.5: http://developer.download.nvidia.com/compute/cuda/7.5/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
- 7.0: http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Toolkit_Release_Notes.pdf
- 6.5: http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_Toolkit_Release_Notes.pdf
- 6.0: http://developer.download.nvidia.com/compute/cuda/6_0/rel/docs/CUDA_Toolkit_Release_Notes.pdf
- 5.5: http://developer.download.nvidia.com/compute/cuda/5_5/rel/docs/CUDA_Toolkit_Release_Notes.pdf
Version notes Nvidia HPC SDK:
- CUDA 10.0: First introduced in CUDA 10, the CUDA Forward Compatible Upgrade is designed to allow users to get access to new CUDA features and run applications built with new CUDA releases on systems with older installations of the NVIDIA datacenter GPU driver.
- CUDA 11.1: First introduced in CUDA 11.1, CUDA Enhanced Compatibility provides two benefits:
- By leveraging semantic versioning across components in the CUDA Toolkit, an application can be built for one CUDA minor release (such as 11.1) and work across all future minor releases within the major family (such as 11.x).
- CUDA has relaxed the minimum driver version check and thus no longer requires a driver upgrade with minor releases of the CUDA Toolkit.
Latest, officical Compiler requirements: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
CUDA version | SM Arch | g++ | icpc | pgc++ | xlC | MSVC | clang++ | Linux driver | thrust | note |
---|---|---|---|---|---|---|---|---|---|---|
1.0 | 1.0-1.1 | ? | ? | ? | ||||||
1.1 | 1.0-1.1 | ? | ? | ? | ||||||
2.0 | 1.0-1.1 | ? | ? | ? | ||||||
2.1 | 1.0-1.3 | ? | ? | ? | ||||||
2.3.1 | 1.0-1.3 | ? | ? | ? | ||||||
3.0 | 1.0-2.0 | ? | ? | ? | ||||||
3.1 | 1.0-2.0 | ? | ? | ? | ||||||
3.2 | 1.0-2.1 | ? | 11.1 | ? | ||||||
4.0 | 1.0-2.1 | <=4.4 | 11.1 | ? | ||||||
4.1 | 1.0-2.1 | <=4.5 | 11.1 | ? | ||||||
4.2 | 1.0-2.1 | <=4.6 | 11.1 | ? | ||||||
5.0 | 1.0-3.? | <=4.6 | 11.1 | ? | ? | 1.5.3 | ||||
5.5 | 1.0-3.? | <=4.8 | 12.1 | ? | ? | 1.7.0 | C++11 on host side supported; ICC fixed to build 20110811 |
|||
6.0 | 1.0-5.0 | <=4.8 | 13.1 | ? | 331.62 | 1.7.1 | ||||
6.5 | 1.1-5.X | <=4.8 | 14.0 | ? | ? | ? | 1.7.2 | experimenal device side C++11 support; including this version, <thrust/sort.h> skrews up __CUDA_ARCH__ (must be undefined on host); deprecation of SM 11-13 (10 removed) |
||
7.0.17 (RC) | s. below | <=4.9 | 15.0 | >=14.9 | 13.1.1 | ? | 346.29 | 1.8.0 | first official PGI support, first xlc string found; powerpc64 w. little endian supported | |
7.0.27 | 2.0-5.X | <=4.9 | 15.0 | >=14.9 | 13.1.1 | 2010-13 | 346.46 | 1.8.1 | official C++11 support on device side | |
7.5 | <=4.9 | 15.0 | 15.4 | 13.1 | 2010-13 | 3.5-3.6 | 352.41? | 1.8.2 | clang (host) on linux supported, __CUDACC_VER__ macros added |
|
7.5.18 | 2.0-5.X | <=4.9 | 15.0 | 15.4 | 13.1 | 2010-13 | 352.39 | 1.8.2 | ||
8.0.44 | 2.0-6.X | <=5.3 | 15.0(.4)-16.0 | 16(.3)+ | 13.1(.2) | 2012-15 | 3.8-3.9 | 367.48 | 1.8.3-patch2 | sm_60 (pascal) support added |
8.0.61 | 2.0-6.X | <=5.3 | 15.0(.4)-17.0 | 16(.3)+ | 13.1(.2) | 2012-15 | 3.8-3.9 | 375.26 | 1.8.3-patch2 | nvcc 8 is incompatible with std::tuple in gcc 5.4+ |
9.0.69 (RC) | 3.0-7.0 | <=5.5 (<=6) | 15.0(.4)-17.0 | 17 | 13.1(.2) | 2012-17 | 3.8-3.9 | ???.?? | 1.9.0-patch4 | device-side C++14; __CUDACC_VER__ deprecated for __CUDACC_VER_MAJOR/MINOR/BUILD__ |
9.0.103 (RC) | 3.0-7.0 | <=5.5 (<=6) | 15.0(.4)-17.0 | 17 | 13.1(.2) | 2012-17 | 3.8-3.9 | 384.59 | 1.9.0-patch4 | same as above, __CUDACC_VER__ defined as #error rendering it fully broken |
9.0.176 | 3.0-7.0 | <=5.5 (<=6) | (15.0-)17.0 | 17.1 | 13.1(.5) | 2012-17 | (3.8-)3.9 | 384.81 | 1.9.0-patch5 | same as above |
9.1.85 | 3.0-7.2 | <=5.5 (<=6) | (15.0-)17.0 | 17.X | 13.1(.6) | 2012-17 | (3.8-)4.0 | 390.46 | 1.9.1-patch2 | math_functions.hpp moved to crt/ |
9.1.85.1 | cuBLAS 9.1.128: Volta GEMM kernels optimized | |||||||||
9.1.85.2 | ptxas: fix address calculations using large immediate operands | |||||||||
9.1.85.3 | cuBLAS: fixes to GEMM optimizations for convolutional sequence to sequence (seq2seq) models. | |||||||||
9.0-9.1 | nvcc 9.0-9.1 is incompatible with std::tuple in gcc 6+ |
|||||||||
9.2.88 | 3.0-7.2 | <=7.3.0 (<=7) | (15.0-)17.0 | 17-18.X | 13.1(.6),16.1 | 2012-17 | (3.8-)5.0 | 396.26 | 1.9.2 | CUTLASS 1.0 added; std::tuple fixed (prior GCC 6 issues) |
9.2.148 | 396.37 | 1.9.2 | ||||||||
10.0.130 | 3.0-7.5 | <=7 | (15.0-)18.0 | 17-18.X | 13.1, 16.1 | 2013-17 | (3.8-)6.0 | 410.48 | 1.9.3 | CUDA Forward Compatible Upgrade |
10.1.105 | 3.0-7.5 | <=8 | (15.0-)19.0 | 17-19.X | 2013-19 | (3.8-)7.0 | 418.39 | 1.9.4 | ||
10.1.168 | (3.8-)8.0 | 418.67 | 10.1 "Update 1" | |||||||
10.1.243 | 418.87 | 10.1 "Update 2" | ||||||||
10.2.89 | 3.0-7.5 | <=8 | (15.0-)19.0 | 18-19.X | 13.1, 16.1 | 2015-19 | (3.3-)8.X | 440.33.01 | 1.9.7 | sm_30,35,37,50 deprecated; nvcc : -allow-unsupported-compiler |
11.0.1 (RC) NVCC:11.0.167 | 3.5-8.0 | (5-)6-9.* | (15.0-)19.1 | 18-20.1 | 13.1, 16.1 | 2015-19 | 3.2-9.0.0 | 450.36.06 | 1.9.9 | macOS dropped; libs drop pre-C++11, deprecate pre-C++14 (GCC < 5, Clang < 6, and MSVC < 2017); Arm C/C++ 19.2 support |
11.0.2-1 NVCC:11.0.194 | (3.3/)6-9.0.0 | 450.51.05 | nvcc : --Wext-lambda-captures-this |
|||||||
11.0.3 NVCC:11.0.221 | ? | ? | ? | ? | ? | ? | ? | 450.51.06 | ? | 11.0 "Update 1"; nvcc : --forward-unknown-to-host-compiler , --forward-unknown-to-host-linker flags |
11.1.0 NVCC:11.1.74 | 3.5-8.6 | (5-)6-10.0 | (15.0-)19.1 | 18-20.1 | 13.1, 16.1 | 2017-19 | (3.3/)6-10.X | 455.23.05 | 1.9.10-1 | Ubuntu@ppc64le deprecated; CUDA Enhanced Compatibility |
11.1.1 NVCC:11.1.? | ? | ? | ? | |||||||
11.2.0 NVCC:11.2.67 | <12 | 460.27.04 | 1.10.0 | |||||||
11.2.1 NVCC:11.2.142 | 460.32.03 | ? | "Update 1" | |||||||
11.2.2 NVCC:11.2.152 | 460.32.03 | ? | "Update 2" | |||||||
11.3.0 NVCC:11.3.58 | 6.0-10.X | 465.19.01 | ? | cu++flt added, Python Driver/RT bindings, alloca() |
||||||
11.4.0 NVCC:11.4.48 | 6.0-11.X | <13 | 470.42.01 | ? | sm30,32 and Ubuntu 16.04 dropped, C++11 stdlib for math | |||||
11.4.1 NVCC:11.4.100 | 6.0-11.X | 470.57.02 | ? | 11.4 "Update 1", fix g++ 10 issues with chrono headers of libstdc++; Ubuntu 16.04 dropped (x86) | ||||||
11.4.2 NVCC:11.4.120 | 3.2-12.X | 470.57.02 | ? | ... | ||||||
11.5.0 NVCC:11.5.50 | 6.0-11.X | 3.2-12.X | 495.29.05 | ? | ... | |||||
11.5.1 NVCC:11.5.119 | ||||||||||
11.6.0 NVCC:11.6.55 | 6.0-11.X | adds VS2022 | 3.2-13.X | 510.39.01 | ? | adds -arch=native and PTX generation in nvlink (for LTO workflows with PTX) |
||||
11.6.1 NVCC:11.6.112 | 510.47.03 | ? | ||||||||
11.6.2 NVCC:11.6.124 | 510.47.03 | ? | ||||||||
11.7.0 NVCC:11.7.64 | ? | ? | ? | ? | 515.43.04 | ? | ||||
11.7.1 NVCC:11.7.99 | 515.65.01 | ? | ||||||||
11.8.0 NVCC:11.8.89 | 6.0-11.2.1 | 520.61.05 | ? | |||||||
12.0.0 NVCC:12.0.76 | 4.0-9.0 | 6.0-12.1 (12.2.1) | 2021.6 | 22.7 | 16.1.x | -VS2022 17.4 | -14.X | 525.60.13 | 2.0.1 | Hopper and Lovelance, JIT LTO (nvJitLink lib), NVVM IR 2.0, CUDA-MEMCHECK -> Compute Sanitizer , sm_35/37 dropped in all libs, 32-bit compilation support dropped |
CUDA version | SM | g++ | icpc | pgc++ | xlC | MSVC | clang++ | Linux driver | thrust | note |
SM: means SM architecture support.
pgc++: now NVHPC products, e.g., nvc
/nvfortran
/nvc++
.
Note: empty cells generally mean "same as above" for readability.
macOS: As of 7.0, clang seems to be the only supported compiler on OSX (but no version check found). CUDA 10.1.243 adds support for Xcode 10.2 . CUDA 11.0 dropped macOS support.
Compilers such as pgC, icc, xlC are only supported on x86 linux and little endian.
Dynamic parallelism was added with sm_35
and CUDA 5.0
.
Newer CUDA releases have a per-release support matrix for compilers, which also lists supported kernel and glibc versions: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements
clang++ can compile CUDA C++ to ptx as well. Give it a whirl!
clang++ | supported CUDA release | supported SMs |
---|---|---|
3.9-5.0 | 7.0-8.0 | 2.0-(5.0)6.0 |
6.0 | 7.0-9.0 | (2.0)3.0-7.0 |
7.0 | 7.0-9.2 | (2.0)3.0-7.2 |
8.0 | 7.0-10.0 | (2.0)3.0-7.5 |
9.0 | 7.0-10.1 | (2.0)3.0-7.5 |
10.0 | 7.0-10.1 | (2.0)3.0-7.5 |
11.0 | 7.0-11.0 | (2.0)3.0-8.0 |
12.0 | 7.0-11.0 | (2.0)3.0-8.0 |
13.0 | 7.0-11.2 | (2.0)3.0-8.6 |
14.0 | 7.0-11.5 | (2.0)3.0-8.6 |
15.0 | 7.0-11.5 | (2.0)3.0-8.6 |
main | 7.0-11.5 | (2.0)3.5-9.0 |
https://llvm.org/docs/CompileCudaWithLLVM.html
C++ core language features:
supported C++ standard | notes | |
---|---|---|
nvcc -6.0 | c++03 | |
nvcc 6.5 | c++03, exp. c++11 | undocumented |
nvcc 7.0-8.0 | c++03,11 | only c++11 switch |
nvcc 9.0-10.2 | c++03,11,14 | 10.2 adds libcu++ (atomics); open repository: https://github.com/NVIDIA/libcudacxx/releases |
nvcc 11.0.167+ | c++03,11,14,17 | C++11 host compiler needed for math libs; ships C++11-compatible backport of the C++20 synchronization library; device LTO added; starting with CUDA Toolkit 11.0.1, nvcc and CUDA Toolkit versions are not equivalent anymore |
nvcc 12.0+ | c++03,11,14,17,20 | |
clang 5+ | c++03,11,14,17 | |
clang 6+ | c++03,11,14,17,2a | |
clang 10+ | c++03,11,14,17,20 | |
clang 13+ | c++03,11,14,17,20,2b | |
clang trunk | c++03,11,14,17,20,2b | status |
CUDA-enabled C++ standard library libcu++
, based on LLVM's libc++
(docs):
introduced components | notes | |
---|---|---|
CUDA 10.2 | <atomic> (SM6.0+), <type_traits> |
introduction of libcu++ |
CUDA 11.0 | atomic<T>::wait/notify , <barrier> , <latch> , <counting_semaphore> (SM7.0+), <chrono> , <ratio> , <functional> w/o function |
anticipated with GTC 2020 slides |
CUDA 11.2 | cuda::std::tuple ,pair |
notes |
CUDA 12.0 | cuda::std::barrier |
|
CUDA next | cuda::std::complex , backports: chrono , type_traits |
notes |
newer | see the release notes and api docs | all open source now |
Incremental libcu++
release goals (GTC 2020):
- Version 1 (CUDA 10.2):
<atomic>
(SM6.0+),<type_traits>
. - Version 2 (CUDA next):
atomic<T>::wait/notify
,<barrier>
,<latch>
,<counting_semaphore>
(SM7.0+),<chrono>
,<ratio>
,<functional>
minus function. - Future priorities:
atomic_ref<T>
,<complex>
,<tuple>
,<array>
,<utility>
,<cmath>
, string processing, ...
NVC++ is a unified C++ compiler and GPU-accelerated STL for the CUDA platform. It also seems to support OpenACC. NVC++ does currently not support the CUDA C++ language.
supported C++ standard | notes | |
---|---|---|
nvc++ 11.0 | ...,c++17 | initial release, ships C++11-compatible backport of the C++20 synchronization library |
All GPU compilers are cheese.