Note
- Find the NVIDIA GPU products matching the compute capability versions below here:
- compute capability ≥ 7.5 (
≥ sm_75
) - compute capability < 7.5 (
< sm_75
)
- compute capability ≥ 7.5 (
- Find me at bit.ly/cuda-gencode-compat-matrix
Term | Description |
---|---|
SM | "The chip" - more specifically the GPU's Streaming Multiprocessors on which CUDA runs |
SM architecture | A chip of a specific version with a certain set of capabilities (ex: sm_120 , sm_121 ) |
SM family | Similar chips, sharing a set of capabilities, and a major version (ex: sm_9x , sm_12x ) |
Compute Capability | Set of features that can run on chips. Each new chip comes with a new compute capability for it of the same version (ex: 12.1 for sm_121 ), though older compute capabilities might be compatible to run on newer SMs (more below) |
CUDA device code | Your CUDA GPU code - including kernels and device functions - that leverages features of a certain Compute Capability and needs to be compiled to run on an SM |
ISA | Instruction set architectures, or your code once compiled - two types defined below |
Virtual ISA | Intermediary ISA for a given compute capability, that still need to be translated for a specific SM to execute |
PTX | The one and only virtual ISA format and file |
Real ISA | Take PTX, translate it for a specific SM, you get a real ISA that a SM can now execute |
cubin | The file representation of the real ISA |
SASS | Streaming Assembler - the pre-blackwell real ISA type |
Offline compilation | Mechanism that, given specific compute capabilities, can generate PTX from CUDA device code, cubin from that PTX, and join them in a fatbinary, all on a build system that is not required to host the target SM(s). |
NVCC | The offline compiler (executable) |
Runtime compilation | Mechanism that, given a specific compute capability, generates PTX from CUDA device code dynamically at runtime. |
NVRTC | The runtime compiler (library) |
JIT compilation | Mechanism that generates the most adequate cubin for a target SM using PTX as input (from NVCC or NVRTC). This is done during the application startup (Just In Time) by the CUDA driver if there are no prebuilt cubin compatible for the current SM. JIT incurs a startup cost but can provides extra forward compatibility (more below) |
Base compute capabilities have existed since the beginning of CUDA. Architecture then family specific compute capabilities where introduced in CUDA 12.0 and 12.9 respectively to enable features not included in base compute capabilities, but with more limited forward compatibility (Reference):
- Base:
<major>.<minor>
cubin
: compatible to run onsm_<major><y>
wherey >= minor
(i.e. any future sm of the same family)ptx
: same ascubin
+ compatible to build & run onsm_<x><y>
wherex > major
(i.e. any future sm)
- Family-specific:
<major>.<minor>f
cubin
: compatible to run onsm_<major><y>
wherey >= minor
(i.e. any future sm of the same family)ptx
: same ascubin
(i.e. any future sm of the same family)
- Architecture-specific:
<major>.<minor>a
cubin
: compatible to run/build onsm_<major><minor>
only (i.e. no other sm)ptx
: same ascubin
(i.e. no other sm)
Tip
- Prebuild
cubin
for the specific chips you are targeting to extract all their features- When available, use family-specific for
cubin
(ex:sm_120f
) instead of base compute capability (e.gsm_120
) since you'll get potentially extra features with the same forward compatibility - Only use architecture-specific for
cubin
(ex:sm_120a
) if you really need the specific feature this unlocks on that specific chip, as you won't get compatibility for future chips of that family
- When available, use family-specific for
- If optimizing the size of your fat binaries is important, research whether any given compute capability version provides required features you want to leverage on that chip, or if it is sufficient to rely on the compatibility from the major version or lower minor version you might already build for. You can also ensure to only enable building a certain compute capability for the specific cuda kernels requiring that feature, instead of globally for all cuda kernels in a project/build. E.g.:
8.9
does not add much to8.6
apart from fp8 support.10.1
is deprecated in favor of11.0
for Thor.12.1
is the exact same as12.0
since the only difference is the physically integrated CPU+GPU memory of Spark (sm_121
) compared to (sm_120
), for which there are no current kernel optimizations.3.2
(K1),5.3
(Nano),6.2
(TX2)7.2
(Xavier),8.7
(Orin),8.8
(Switch2),10.1
(Thor < cu13),11.0
(Thor ≥ 13) are Jetson/Tegra, so never of use forx86_64
, nor needed if youraarch64
builds only support datacenter chips (sbsa) - and vice-versa.
- Build
ptx
- of base compute compatibility - either:- for the latest chip you are targeting - for forward compatibility with future chips with compute capability higher than your highest
cubin
build - and/or the oldest chip you want to support - for less-performant compatibility with older chips than your lowest
cubin
build. In that scenario, you could even skip anycubin
build if performance and JIT compilation are acceptable.
- for the latest chip you are targeting - for forward compatibility with future chips with compute capability higher than your highest
- Unless you have reasons to not want to distribute any
cubin
, there are no reasons to retain and distributeptx
for family or architecture specific compute capabilities, as they do not offer more compatibility thancubin
and require JIT compilation.
Note
This table was generated on 2025-08-09 using cuda-system-utils/get_nvcc_sm_supported_versions.py. It extracts that information by parsing the --help
output of each version of nvcc
found.
curl -sSL https://raw.githubusercontent.com/agirault/cuda-system-utils/refs/heads/main/scripts/get_nvcc_sm_supported_versions.py | python3 - -s
CUDA Ver \ SM Arch | sm_30 | sm_32 | sm_35 | sm_37 | sm_50 | sm_52 | sm_53 | sm_60 | sm_61 | sm_62 | sm_70 | sm_72 | sm_75 | sm_80 | sm_86 | sm_87 | sm_88 | sm_89 | sm_90 | sm_90a | sm_100 | sm_100a | sm_100f | sm_101 | sm_101a | sm_101f | sm_103 | sm_103a | sm_103f | sm_110 | sm_110a | sm_110f | sm_120 | sm_120a | sm_120f | sm_121 | sm_121a | sm_121f |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13.0 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||
12.9 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||
12.8 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||
12.6 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||
12.5 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||
12.4 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||
12.3 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||
12.2 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||
12.1 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||
12.0 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||
11.8 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||||||||||||||||
11.7 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||||||||||||||||||
11.6 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||||||||||||||||||
11.5 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||||||||||||||||||
11.4 | X | X | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||||||||||||||||||
11.3 | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||||
11.2 | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||||
11.1 | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||||
11.0 | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||||||||||||||||||||
10.2 | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||||
10.1 | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||||
10.0 | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||||
9.2 | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||||||||||||||||||||
9.1 | X | X | X | X | X | X | X | X | X | X | X | X |
You can tell NVCC what to build and package like so:
-gencode compute=...,code=...
compute
is to define the PTX version to build. Its value iscompute_<CC>
where CC is your compute capability version with no period, where architecture and family specific versions are allowed (ex:compute_90
,compute_100a
,compute_100f
).code
is to define the type of object that is embedded in the output:- PTX: you can embed the PTX you built just above by passing the same value (
compute_<CC>
) to this segment - this skips offline compilation - cubin: you can translate the PTX you built just above to cubin by passing a value in the form of
sm_<CC>
where CC is your compute capability version with no period - this will then embed that cubin, dropping the intermediary PTX from the previous segment
- PTX: you can embed the PTX you built just above by passing the same value (
- You can use the
-gencode
flag multiple times, once for each binary type and version you want to embed in your fat binary
Example:
nvcc -c kernels.cu \
-gencode compute=compute_86,code=sm_86 \ # This embeds cubin for Ampere. We'll skip Ada (sm_89) since we don't need fp8
-gencode compute=compute_90,code=sm_90 \ # This embeds cubin for Hopper. No need for 9.0a, we don't use CUTLASS's accelerated features: https://docs.nvidia.com/cutlass/media/docs/cpp/functionality.html
-gencode compute=compute_100f,code=sm_100 \ # This embeds cubin for Blackwell B200 chips. Enabling family-specific features \
-gencode compute=compute_120f,code=sm_120 \ # This embeds cubin for other Blackwell chips. Enabling family-specific features
-gencode compute=compute_120,code=compute_120 # This embeds the PTX with 12.0 capability for maximum forward compatibility
TODO