Skip to content

Instantly share code, notes, and snippets.

@crosstyan
Last active June 9, 2023 08:20
Show Gist options
  • Save crosstyan/c5fcc523b79f5664c52466122a23e852 to your computer and use it in GitHub Desktop.
Save crosstyan/c5fcc523b79f5664c52466122a23e852 to your computer and use it in GitHub Desktop.

Bandwith test with eGPU (thunderbolt 3)

$ nvidia-smi
Thu Jun  8 23:49:42 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1070         Off| 00000000:01:00.0  On |                  N/A |
| N/A   71C    P5               14W /  N/A|   1255MiB /  8192MiB |     12%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090         Off| 00000000:3D:00.0 Off |                  Off |
|  0%   41C    P8               27W / 450W|      1MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1280      G   /usr/bin/gnome-shell                        473MiB |
|    0   N/A  N/A      1794      G   /usr/bin/Xwayland                            11MiB |
|    0   N/A  N/A      2417      G   /usr/bin/telegram-desktop                     1MiB |
|    0   N/A  N/A      4380      G   /usr/lib/firefox/firefox                    478MiB |
|    0   N/A  N/A      9649      G   /usr/bin/nautilus                           209MiB |
|    0   N/A  N/A     17455      G   /usr/bin/gnome-control-center                71MiB |
+---------------------------------------------------------------------------------------+
$ sudo boltctl list
 ● Cooler Master Technology,Inc MasterCase EG200
   ├─ type:          peripheral
   ├─ name:          MasterCase EG200
   ├─ vendor:        Cooler Master Technology,Inc
   ├─ uuid:          00eaf0b0-222e-8302-ffff-ffffffffffff
   ├─ generation:    Thunderbolt 3
   ├─ status:        authorized
   │  ├─ domain:     e4030000-0090-8518-a3dc-9b04a244b21e
   │  ├─ rx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  ├─ tx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  └─ authflags:  none
   ├─ authorized:    REDACTED
   ├─ connected:     REDACTED
   └─ stored:        REDACTED
      ├─ policy:     auto
      └─ key:        no
OS: Manjaro Linux x86_64 
Host: P7xxTM 
Kernel: 6.3.5-2-MANJARO 
Shell: zsh 5.9 
Resolution: 3840x2160 
DE: GNOME 44.1 
WM: Mutter 
Terminal: gnome-terminal 
CPU: Intel i5-8400 (6) @ 4.000GHz 
GPU: NVIDIA GeForce GTX 1070 Mobile 
GPU: NVIDIA GeForce RTX 4090 
Memory: 15109MiB / 40057MiB 
$ ./nvbandwidth
nvbandwidth Version: v0.2
Built from Git version: 42e94d2

NOTE: This tool reports current measured bandwidth on your system.
Additional system-specific tuning may be required to achieve maximal peak bandwidth.

CUDA Runtime Version: 12010
CUDA Driver Version: 12010
Driver Version: 530.41.03

Device 0: NVIDIA GeForce RTX 4090
Device 1: NVIDIA GeForce GTX 1070

Running host_to_device_memcpy_ce.
memcpy CE CPU(row) -> GPU(column) bandwidth (GB/s)
          0         1
0      2.39     12.49

SUM host_to_device_memcpy_ce 14.87

Running device_to_host_memcpy_ce.
memcpy CE CPU(row) <- GPU(column) bandwidth (GB/s)
          0         1
0      2.83     13.11

SUM device_to_host_memcpy_ce 15.93

Running host_to_device_bidirectional_memcpy_ce.
memcpy CE CPU(row) <-> GPU(column) bandwidth (GB/s)
          0         1
0      1.61     10.74

SUM host_to_device_bidirectional_memcpy_ce 12.35

Running device_to_host_bidirectional_memcpy_ce.
memcpy CE CPU(row) <-> GPU(column) bandwidth (GB/s)
          0         1
0      1.63     11.08

SUM device_to_host_bidirectional_memcpy_ce 12.70

Waiving device_to_device_memcpy_read_ce.

Waiving device_to_device_memcpy_write_ce.

Waiving device_to_device_bidirectional_memcpy_read_ce.

Waiving device_to_device_bidirectional_memcpy_write_ce.

Running all_to_host_memcpy_ce.
memcpy CE CPU(row) <- GPU(column) bandwidth (GB/s)
          0         1
0      2.84     12.91

SUM all_to_host_memcpy_ce 15.74

Running all_to_host_bidirectional_memcpy_ce.
memcpy CE CPU(row) <- GPU(column) bandwidth (GB/s)
          0         1
0      1.63     10.19

SUM all_to_host_bidirectional_memcpy_ce 11.82

Running host_to_all_memcpy_ce.
memcpy CE CPU(row) -> GPU(column) bandwidth (GB/s)
          0         1
0      2.39     12.46

SUM host_to_all_memcpy_ce 14.84

Running host_to_all_bidirectional_memcpy_ce.
memcpy CE CPU(row) <- GPU(column) bandwidth (GB/s)
          0         1
0      1.62      9.94

SUM host_to_all_bidirectional_memcpy_ce 11.56

Waiving all_to_one_write_ce.

Waiving all_to_one_read_ce.

Waiving one_to_all_write_ce.

Waiving one_to_all_read_ce.

Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
          0         1
0      2.41     11.82

SUM host_to_device_memcpy_sm 14.23

Running device_to_host_memcpy_sm.
memcpy SM CPU(row) <- GPU(column) bandwidth (GB/s)
          0         1
0      2.85     13.11

SUM device_to_host_memcpy_sm 15.96

Waiving device_to_device_memcpy_read_sm.

Waiving device_to_device_memcpy_write_sm.

Waiving device_to_device_bidirectional_memcpy_read_sm.

Waiving device_to_device_bidirectional_memcpy_write_sm.

Running all_to_host_memcpy_sm.
memcpy SM CPU(row) <- GPU(column) bandwidth (GB/s)
          0         1
0      2.84     12.52

SUM all_to_host_memcpy_sm 15.36

Running all_to_host_bidirectional_memcpy_sm.
memcpy SM CPU(row) <- GPU(column) bandwidth (GB/s)
          0         1
0      1.73     12.89

SUM all_to_host_bidirectional_memcpy_sm 14.62

Running host_to_all_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
          0         1
0      2.39     12.04

SUM host_to_all_memcpy_sm 14.43

Running host_to_all_bidirectional_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
          0         1
0      1.66     12.04

SUM host_to_all_bidirectional_memcpy_sm 13.69

Waiving all_to_one_write_sm.

Waiving all_to_one_read_sm.

Waiving one_to_all_write_sm.

Waiving one_to_all_read_sm.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment