Skip to content

Instantly share code, notes, and snippets.

@MohamedElashri
Created September 19, 2024 10:47
Show Gist options
  • Save MohamedElashri/09453956091ea4ca506bf88bd8eaa1d6 to your computer and use it in GitHub Desktop.
Save MohamedElashri/09453956091ea4ca506bf88bd8eaa1d6 to your computer and use it in GitHub Desktop.
Information about some NVIDIA GPUs
Memory Type NVIDIA GeForce RTX 3090 NVIDIA RTX A5000 NVIDIA GeForce RTX 2080 Ti
Global Memory 24 GB GDDR6X 24 GB GDDR6 11 GB GDDR6
Memory Interface 384-bit 384-bit 352-bit
Memory Bandwidth 936 GB/s 768 GB/s 616 GB/s
L2 Cache 6 MB Not specified 5.5 MB
Shared Memory 164 KB per SM 164 KB per SM 64 KB per SM
Constant Memory 64 KB 64 KB 64 KB
Texture Memory Part of global memory Part of global memory Part of global memory
Local Memory Part of global memory Part of global memory Part of global memory
Registers Not specified Not specified 64K 32-bit registers per SM
L1 Cache 128 KB per SM 128 KB per SM 64 KB per SM

Sources:

  1. https://www.nvidia.com/en-us/design-visualization/rtx-a5000/
  2. https://www.cudocompute.com/blog/nvidia-rtx-a5000-everything-you-need-to-know
  3. https://xenon.com.au/product/nvidia-rtx-a5000/
  4. https://ren-fengbo.lab.asu.edu/content/nvidia-geforce-rtx-3090
  5. https://www.servethehome.com/nvidia-geforce-rtx-2080-ti-graphics-card-review/2/
Memory Type NVIDIA GeForce RTX 3090 NVIDIA RTX A5000 NVIDIA GeForce RTX 2080 Ti Relative Speed Scope
Global Memory 24 GB GDDR6X[1] 24 GB GDDR6[2] 11 GB GDDR6[3] Slow All threads
Memory Interface 384-bit[1] 384-bit[2] 352-bit[3] N/A N/A
Memory Bandwidth 936 GB/s[1] 768 GB/s[2] 616 GB/s[3] N/A N/A
L1 Cache 128 KB/SM 128 KB/SM 64 KB/SM Fastest Thread block
L2 Cache 6 MB[1] 6 MB[2] 5.5 MB[3] Very Fast All SMs
Shared Memory 48 KB/SM 48 KB/SM 48 KB/SM Very Fast Thread block
Registers 65,536/SM 65,536/SM 65,536/SM Fastest Thread
Texture Memory Uses L1/L2 Uses L1/L2 Uses L1/L2 Fast Thread block
Local Memory Uses Global Uses Global Uses Global Slow Thread
Constant Memory 64 KB 64 KB 64 KB Fast All threads (read-only)

NVIDIA GeForce RTX 3090

Memory Type Typical Size Relative Speed Scope
Global Memory 24 GB GDDR6X[1] Slow All threads
Memory Interface 384-bit[1] N/A N/A
Memory Bandwidth 936 GB/s[1] N/A N/A
L1 Cache 128 KB/SM Fastest Thread block
L2 Cache 6 MB[1] Very Fast All SMs
Shared Memory 48 KB/SM Very Fast Thread block
Registers 65,536/SM Fastest Thread
Texture Memory Uses L1/L2 Fast Thread block
Local Memory Uses Global Slow Thread
Constant Memory 64 KB Fast All threads (read-only)
Specification Details
CUDA Cores 10496
Base Clock 1395 MHz
Boost Clock 1695 MHz (Gaming Mode), 1725 MHz (OC Mode)
Memory Size 24 GB GDDR6X
Memory Interface 384-bit
Memory Bandwidth 936 GB/s
Memory Speed 19.5 Gbps
RT Cores 82 (2nd Generation)
Tensor Cores 328 (3rd Generation)
Architecture Ampere
Power Consumption 350W
Recommended PSU 750W
Power Connectors 2x 8-pin
Maximum Digital Resolution 7680x4320
Standard Display Connectors HDMI 2.1, 3x DisplayPort 1.4a
Multi Monitor Support 4
Card Dimensions 313 mm x 138 mm (L x W)
Slot Width 3-slot

NVIDIA RTX A5000

Memory Type Typical Size Relative Speed Scope
Global Memory 24 GB GDDR6[2] Slow All threads
Memory Interface 384-bit[2] N/A N/A
Memory Bandwidth 768 GB/s[2] N/A N/A
L1 Cache 128 KB/SM Fastest Thread block
L2 Cache 6 MB[2] Very Fast All SMs
Shared Memory 48 KB/SM Very Fast Thread block
Registers 65,536/SM Fastest Thread
Texture Memory Uses L1/L2 Fast Thread block
Local Memory Uses Global Slow Thread
Constant Memory 64 KB Fast All threads (read-only)
Specification Details
CUDA Cores 8192
Base Clock Not specified
Boost Clock Not specified
Memory Size 24 GB GDDR6
Memory Interface 384-bit
Memory Bandwidth 768 GB/s
Memory Speed Not specified
RT Cores 64 (2nd Generation)
Tensor Cores 256 (3rd Generation)
Architecture Ampere
Power Consumption 230W
Recommended PSU Not specified
Power Connectors 1x 8-pin PCIe
Maximum Digital Resolution 4x 4096x2160 @ 120 Hz
Standard Display Connectors 4x DisplayPort 1.4a
Multi Monitor Support 4
Card Dimensions 4.4" H x 10.5" L
Slot Width Dual slot

NVIDIA GeForce RTX 2080 Ti

Memory Type Typical Size Relative Speed Scope
Global Memory 11 GB GDDR6[3] Slow All threads
Memory Interface 352-bit[3] N/A N/A
Memory Bandwidth 616 GB/s[3] N/A N/A
L1 Cache 64 KB/SM Fastest Thread block
L2 Cache 5.5 MB[3] Very Fast All SMs
Shared Memory 48 KB/SM Very Fast Thread block
Registers 65,536/SM Fastest Thread
Texture Memory Uses L1/L2 Fast Thread block
Local Memory Uses Global Slow Thread
Constant Memory 64 KB Fast All threads (read-only)
Specification Details
CUDA Cores 4352
Base Clock 1350 MHz
Boost Clock 1545 MHz (1635 MHz for some models)
Memory Size 11 GB GDDR6
Memory Interface 352-bit
Memory Bandwidth 616 GB/s
Memory Speed 14 Gbps
RT Cores 68 (1st Generation)
Tensor Cores 544 (2nd Generation)
Architecture Turing
Power Consumption 250W (up to 300W for some models)
Recommended PSU 650W
Power Connectors 2x 8-pin
Maximum Digital Resolution 7680x4320
Standard Display Connectors DisplayPort 1.4a, HDMI 2.0b, USB Type-C
Multi Monitor Support 4
Card Dimensions Varies by model (e.g., 327 x 140 x 55.6 mm for some)
Slot Width 2-slot (may vary by model)
Memory Type NVIDIA GeForce RTX 3090 NVIDIA RTX A5000 NVIDIA GeForce RTX 2080 Ti
Global Memory 24 GB GDDR6X 24 GB GDDR6 11 GB GDDR6
Memory Interface 384-bit 384-bit 352-bit
Memory Bandwidth 936 GB/s 768 GB/s 616 GB/s
Memory Speed 19.5 Gbps Not specified 14 Gbps
L2 Cache 6 MB Not specified 5.5 MB
Shared Memory 164 KB per SM 164 KB per SM 64 KB per SM
Constant Memory 64 KB 64 KB 64 KB
Texture Memory Part of global memory Part of global memory Part of global memory
Local Memory Part of global memory Part of global memory Part of global memory
Registers Not specified Not specified 64K 32-bit registers per SM
L1 Cache 128 KB per SM 128 KB per SM 64 KB per SM
ECC Support Not specified Yes Not specified

[1] Source: NVIDIA GeForce RTX 3090 specifications

[2] Source: NVIDIA RTX A5000 specifications

[3] Source: NVIDIA GeForce RTX 2080 Ti specifications

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment