| NVLink |
NVIDIA GPUs (Pascal, Volta, Hopper, Blackwell), NVIDIA CPUs (e.g., Grace) |
Sub-µs (0.1–0.5 µs, intra-node P2P) |
72 (18 ports × 4 lanes, NVLink 5.0) |
~2.5–3.5% (128b/130b, headers) |
6.09 GB/s (48.72 Gbps) |
438.75 GB/s |
877.5 GB/s |
NVIDIA |
Linux (primary), Windows (limited via WSL) |
| InfiniBand |
CPUs, NVIDIA/AMD/Intel GPUs (via GPUDirect RDMA), NICs (e.g., Mellanox) |
~1–5 µs (point-to-point) |
32 (8 ports × 4 lanes, NDR) |
~8–13% (64b/66b, headers, CRC) |
11.25 GB/s (90 Gbps) |
360 GB/s |
720 GB/s |
NVIDIA (Mellanox), Intel, Broadcom |
Linux, Windows (partial), macOS (limited) |
| PCIe |
CPUs, GPUs (NVIDIA, AMD, Intel), accelerators, NICs |
~0.5–2 µs (intra-node, PCIe 4.0/5.0) |
16 (x16, PCIe 5.0) |
~3.5–6.5% (128b/130b, TLPs) |
1.915 GB/s (15.32 Gbps) |
30.64 GB/s |
61.28 GB/s |
Broad (Intel, AMD, NVIDIA, etc.) |
Linux, Windows, macOS |
| Ethernet |
CPUs, GPUs (via TCP/IP or RoCE), NICs |
~10–100 µs (network-dependent) |
4 (100GbE), 8 (400GbE) |
~13–18% (64b/66b, TCP/IP headers) |
2.69 GB/s (100GbE), 5.375 GB/s (400GbE) |
10.75 GB/s (100GbE), 43 GB/s (400GbE) |
21.5 GB/s (100GbE), 86 GB/s (400GbE) |
Broad (Intel, Broadcom, Cisco, etc.) |
Linux, Windows, macOS |
| Infinity Fabric |
AMD GPUs (e.g., MI300x), AMD CPUs (EPYC, Ryzen) |
Sub-µs (0.1–0.5 µs, intra-node) |
~24 (estimated, proprietary) |
~2–5% (proprietary, coherency) |
4.67 GB/s (37.36 Gbps) |
112.13 GB/s |
224.26 GB/s |
AMD |
Linux (primary), Windows (limited) |
| RoCE |
CPUs, GPUs (via GPUDirect RDMA), NICs (e.g., Mellanox) |
~1–10 µs |
4 (100GbE), 8 (200GbE) |
~8–13% (64b/66b, RDMA headers) |
2.81 GB/s (22.48 Gbps) |
11.25 GB/s (100GbE), 22.5 GB/s (200GbE) |
22.5 GB/s (100GbE), 45 GB/s (200GbE) |
NVIDIA (Mellanox), Broadcom, Intel |
Linux, Windows (partial), macOS (limited) |
| xGMI |
AMD GPUs (e.g., MI250, MI300), AMD CPUs |
Sub-µs (0.1–0.5 µs, intra-node) |
~24 (estimated, proprietary) |
~2–5% (proprietary, coherency) |
4.06 GB/s (32.48 Gbps) |
97.5 GB/s |
195 GB/s |
AMD |
Linux (primary, via ROCm) |
| Shared Memory |
CPUs, GPUs (within same node, e.g., NUMA) |
<1 µs (fastest for intra-node) |
N/A (memory bus) |
~1–5% (cache coherency, bus) |
N/A |
243.75 GB/s |
487.5 GB/s |
Broad (Intel, AMD, NVIDIA) |
Linux, Windows, macOS |
| CXL (Current) |
CPUs (Intel Xeon, AMD EPYC, NVIDIA Grace), GPUs (Intel, AMD, emerging NVIDIA), accelerators, memory |
~0.1–0.2 µs (intra-node), 1–10 µs (inter-node) |
16 (PCIe 5.0/6.0 x16) |
~3.5–6.5% (128b/130b, headers) |
7.66 GB/s (61.28 Gbps, CXL 4.0) |
61.28 GB/s (CXL 3.0), 122.56 GB/s (CXL 4.0) |
122.56 GB/s (CXL 3.0), 245.12 GB/s (CXL 4.0) |
CXL Consortium (Intel, AMD, NVIDIA, Arm, Broadcom, Google, Meta, Microsoft) |
Linux, Windows (partial), macOS (limited) |
| CXL (Current, Scaled to 64 Lanes) |
CPUs (Intel Xeon, AMD EPYC, NVIDIA Grace), GPUs (Intel, AMD, NVIDIA), accelerators, memory |
~0.1–0.2 µs (intra-node), 1–5 µs (inter-node) |
64 (4 × PCIe 6.0 x16) |
~3.5–6.5% (128b/130b, headers) |
7.66 GB/s (61.28 Gbps) |
490.24 GB/s |
980.48 GB/s |
CXL Consortium (Intel, AMD, NVIDIA, Arm, Broadcom, Google, Meta, Microsoft) |
Linux, Windows (partial), macOS (limited) |
| CXL (Hypothetical 256 GB/s) |
CPUs (Intel Xeon, AMD EPYC, NVIDIA Grace), GPUs (Intel, AMD, NVIDIA), accelerators, memory |
~0.05–0.1 µs (intra-node), 0.5–5 µs (inter-node) |
16 (PCIe 7.0 x16) |
~3.5–6.5% (128b/130b, headers) |
15.32 GB/s (122.56 Gbps) |
245.12 GB/s |
490.24 GB/s |
CXL Consortium (Intel, AMD, NVIDIA, Arm, Broadcom, Google, Meta, Microsoft) |
Linux, Windows (partial), macOS (limited) |
| CXL (Hypothetical 512 GB/s) |
CPUs (Intel Xeon, AMD EPYC, NVIDIA Grace), GPUs (Intel, AMD, NVIDIA), accelerators, memory |
~0.025–0.05 µs (intra-node), 0.25–2.5 µs (inter-node) |
32 (PCIe 7.0 x32) or 16 (PCIe 8.0 x16) |
~3.5–6.5% (128b/130b, headers) |
15.32 GB/s (PCIe 7.0 x32), 30.64 GB/s (PCIe 8.0 x16) |
490.24 GB/s |
980.48 GB/s |
CXL Consortium (Intel, AMD, NVIDIA, Arm, Broadcom, Google, Meta, Microsoft) |
Linux, Windows (partial), macOS (limited) |