Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active April 27, 2025 18:36
Show Gist options
  • Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Thunderbolt Networking Setup

Thunderbolt Networking

this gist is part of this series

you wil need proxmox kernel 6.2.16-14-pve or higher.

Load Kernel Modules

  • add thunderbolt and thunderbolt-net kernel modules (this must be done all nodes - yes i know it can sometimes work withoutm but the thuderbolt-net one has interesting behaviou' so do as i say - add both ;-)
    1. nano /etc/modules add modules at bottom of file, one on each line
    2. save using x then y then enter

Prepare /etc/network/interfaces

doing this means we don't have to give each thunderbolt a manual IPv6 addrees and that these addresses stay constant no matter what Add the following to each node using nano /etc/network/interfaces

If you see any sections called thunderbolt0 or thunderbol1 delete them at this point.

Create entries to prepopulate gui with reminder

Doing this means we don't have to give each thunderbolt a manual IPv6 or IPv4 addrees and that these addresses stay constant no matter what.

Add the following to each node using nano /etc/network/interfaces this to remind you not to edit en05 and en06 in the GUI

This fragment should go between the existing auto lo section and adapater sections.

iface en05 inet manual
#do not edit it GUI

iface en06 inet manual
#do not edit in GUI

If you see any thunderbol sections delete them from the file before you save it.

*DO NOT DELETE the source /etc/network/interfaces.d/* this will always exist on the latest versions and should be the last or next to last line in /interfaces file

Rename Thunderbolt Connections

This is needed as proxmox doesn't recognize the thunderbolt interface name. There are various methods to do this. This method was selected after trial and error because:

  • the thunderboltX naming is not fixed to a port (it seems to be based on sequence you plug the cables in)
  • the MAC address of the interfaces changes with most cable insertion and removale events
  1. use udevadm monitor command to find your device IDs when you insert and remove each TB4 cable. Yes you can use other ways to do this, i recommend this one as it is great way to understand what udev does - the command proved more useful to me than the syslog or lspci command for troublehsooting thunderbolt issues and behavious. In my case my two pci paths are 0000:00:0d.2and 0000:00:0d.3 if you bought the same hardware this will be the same on all 3 units. Don't assume your PCI device paths will be the same as mine.

  2. create a link file using nano /etc/systemd/network/00-thunderbolt0.link and enter the following content:

[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05
  1. create a second link file using nano /etc/systemd/network/00-thunderbolt1.link and enter the following content:
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

Set Interfaces to UP on reboots and cable insertions

This section en sure that the interfaces will be brought up at boot or cable insertion with whatever settings are in /etc/network/interfaces - this shouldn't need to be done, it seems like a bug in the way thunderbolt networking is handled (i assume this is debian wide but haven't checked).

Huge thanks to @corvy for figuring out a script that should make this much much more reliable for most

  1. create a udev rule to detect for cable insertion using nano /etc/udev/rules.d/10-tb-en.rules with the following content:
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"
  1. save the file

  2. create the first script referenced above using nano /usr/local/bin/pve-en05.sh and with the follwing content:

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en05"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

save the file and then

  1. create the second script referenced above using nano /usr/local/bin/pve-en06.sh and with the follwing content:
#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en06"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

and save the file

  1. make both scripts executable with chmod +x /usr/local/bin/*.sh
  2. run update-initramfs -u -k all to propogate the new link files into initramfs
  3. Reboot (restarting networking, init 1 and init 3 are not good enough, so reboot)

Enabling IP Connectivity

proceed to the next gist

Slow Thunderbolt Performance? Too Many Retries? No traffic? Try this!

verify neighbors can see each other (connectivity troubleshooting)

##3 Install LLDP - this is great to see what nodes can see which.

  • install lldpctl with apt install lldpd on all 3 nodes
  • execute lldpctl you should info

make sure iommu is enabled (speed troubleshooting)

if you are having speed issues make sure the following is set on the kernel command line in /etc/default/grub file intel_iommu=on iommu=pt one set be sure to run update-grub and reboot

everyones grub command line is different this is mine because i also have i915 virtualization, if you get this wrong you can break your machine, if you are not doing that you don't need the i915 entries you see below

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" (note if you have more things in your cmd line DO NOT REMOVE them, just add the two intel ones, doesnt matter where.

Pinning the Thunderbolt Driver (speed and retries troubleshooting)

identify you P and E cores by running the following

cat /sys/devices/cpu_core/cpus && cat /sys/devices/cpu_atom/cpus

you should get two lines on an intel system with P and E cores. first line should be your P cores second line should be your E cores

for example on mine:

root@pve1:/etc/pve# cat /sys/devices/cpu_core/cpus && cat /sys/devices/cpu_atom/cpus
0-7
8-15

create a script to apply affinity settings everytime a thunderbolt interface comes up

  1. make a file at /etc/network/if-up.d/thunderbolt-affinity
  2. add the following to it - make sure to replace echo X-Y with whatever the report told you were your performance cores - e.g. echo 0-7
#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo X-Y | tee "/proc/irq/{}/smp_affinity_list"'
fi
  1. save the file - done

Extra Debugging for Thunderbolt

dynamic kernel tracing - adds more info to dmesg, doesn't overhwelm dmesg

I have only tried this on 6.8 kernels, so YMMV If you want more TB messages in dmesg to see why connection might be failing here is how to turn on dynamic tracing

For bootime you will need to add it to the kernel command line by adding thunderbolt.dyndbg=+p to your /etc/default/grub file, running update-grub and rebooting.

To expand the example above"

`GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt thunderbolt.dyndbg=+p"`  

Don't forget to run update-grub after saving the change to the grub file.

For runtime debug you can run the following command (it will revert on next boot) so this cant be used to cpature what happens at boot time.

`echo -n 'module thunderbolt =p' > /sys/kernel/debug/dynamic_debug/control`

install tbtools

these tools can be used to inspect your thundebolt system, note they rely on rust to be installedm you must use the rustup script below and not intsall rust by package manager at this time (9/15/24)

apt install pkg-config libudev-dev git curl
curl https://sh.rustup.rs -sSf | sh
git clone https://github.com/intel/tbtools
restart you ssh session
cd tbtools
cargo install --path .
@damitjimii
Copy link

damitjimii commented Nov 19, 2024 via email

@gregoribic
Copy link

There is an issue with ISIS when mtu is set so high like 65520. Many times after reboot, topology is not restored.

@ilbarone87
Copy link

ilbarone87 commented Dec 4, 2024

Oh snap, I will give that a try and update if this works. Thanks again for your quick response!

Have you ever tried this and confirmed it was working? I have an MS-01 and reading on previous comment understood that many have faced troubles configuring it as the pci address seems to be dynamic.

this is my output from udevadm monitor

KERNEL[7671.190073] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
KERNEL[7671.190078] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0 (net)
KERNEL[7671.190082] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/rx-0 (queues)
KERNEL[7671.190086] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/tx-0 (queues)
KERNEL[7671.190095] bind     /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
KERNEL[7671.190271] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1 (thunderbolt)
KERNEL[7671.190283] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
KERNEL[7671.190289] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt1 (net)
KERNEL[7671.190294] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt1/queues/rx-0 (queues)
KERNEL[7671.190297] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt1/queues/tx-0 (queues)
KERNEL[7671.190407] bind     /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
UDEV  [7671.190432] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)
UDEV  [7671.190630] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1 (thunderbolt)
UDEV  [7671.190787] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV  [7671.190913] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
UDEV  [7671.196580] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0 (net)
UDEV  [7671.196604] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt1 (net)
UDEV  [7671.196815] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/tx-0 (queues)
UDEV  [7671.196825] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/rx-0 (queues)
UDEV  [7671.196936] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt1/queues/rx-0 (queues)
UDEV  [7671.197045] bind     /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV  [7671.197458] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt1/queues/tx-0 (queues)
UDEV  [7671.197888] bind     /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt

and udevadm info

M: 1-1.0
R: 0
U: thunderbolt
T: thunderbolt_service
V: thunderbolt-net
E: DEVPATH=/devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0
E: SUBSYSTEM=thunderbolt
E: DEVTYPE=thunderbolt_service
E: DRIVER=thunderbolt-net
E: MODALIAS=tbsvc:knetworkp00000001v00000001r00000001

ended up with this config for thunderbolt connection:

Path=pci-0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

Is that correct?

@ilbarone87
Copy link

Completed the guide but the interface are not coming up, noticed that as some others i have the thunderbolt0 int coming up when cable is connected.

I've run the script that one of the guy posted in here https://privatebin.net/?84ecb384b1933641#Cii9RD3f4NVPou7wR66FD5oJYsrWTaXZUNH3UPAWZSrN if anyone can have a look would be great as i'm stumped here.
Tried several ways of mapping the path but none of them worked.

@djs42012
Copy link

Hey @scyto,

I am following up on my earlier question here about adding a fourth node to the cluster/migration network.

Like you, I have also added 2.5Gb expansions to each of the NUCs. I also have two more machines with dedicated NICs that I would like to add to the mesh network and eventual cluster. The goal is 5 nodes, three connected via both thunderbolt and ethernet, and the other two via ethernet only.

The issue I am having is that I can not get the ethernet-only nodes consistently communicating with the other three. The thunderbolt side of the network comes online without issue, but after that it appears to be a crapshoot. Sometimes I can ping between the two categories of node, other times I can't. At time of writing, 4/5 nodes are communicating as intended, but I do not know how to proceed.

Here are my /etc/network/interfaces and /etc/network/interfaced.d/mesh for the three thunderbolt enabled nodes:

# /etc/network/interfaces 

auto lo
iface lo inet loopback

iface en05 inet manual
#do not edit it GUI

iface en06 inet manual
#do not edit in GUI

iface enp87s0 inet manual
#do not edit in GUI

iface enp86s0 inet manual


auto vmbr0
iface vmbr0 inet static
        address 10.1.10.60/24
        gateway 10.1.10.1
        bridge-ports enp86s0
        bridge-stp off
        bridge-fd 0


iface wlo1 inet manual


source /etc/network/interfaces.d/*


post-up /usr/bin/systemctl restart frr.service
# /etc/network/interfaces.d/mesh

auto lo:6
iface lo:6 inet static
        address fc00::80/128
        
allow-hotplug en05
iface en05 inet manual
        mtu 1500

allow-hotplug en06
iface en06 inet manual
        mtu 1500

allow-hotplug enp87s0
iface enp87s0 inet manual
        mtu 1500

Where enp87s0 is the additional 2.5Gb M.2 expansion.

And here is the frr config

# vtysh -c "show running-config"

!
frr version 8.5.2
frr defaults traditional
hostname homer
log syslog informational
no ip forwarding
service integrated-vtysh-config
!
interface en05
 ipv6 router openfabric 1
exit
!
interface en06
 ipv6 router openfabric 1
exit
!
interface enp87s0
 ipv6 router openfabric 1
exit
!
interface lo
 ipv6 router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0000.0000.0000.00
exit
!
end

Here are the same outputs for one of the two ethernet only nodes:

# cat /etc/network/interfaces --

auto lo
iface lo inet loopback

iface net10g1b inet manual
# do not edit in GUI

auto net0
iface net0 inet manual

auto net10g1a
iface net10g1a inet manual

auto bond11
iface bond11 inet manual
        bond-slaves net10g1a net0
        bond-miimon 100
        bond-mode active-backup
        bond-primary net10g1a

auto vmbr0
iface vmbr0 inet static
        address 10.1.10.63/24
        gateway 10.1.10.1
        bridge-ports bond11
        bridge-stp off
        bridge-fd 0


source /etc/network/interfaces.d/*

post-up /usr/bin/systemctl restart frr.service
# /etc/network/interfaces.d/mesh 

#auto lo:0
#iface lo:0 inet static
#        address 10.0.0.83/32
        
auto lo:6
iface lo:6 inet static
        address fc00::83/128

allow-hotplug net10g1b
iface net10g1b inet manual
  mtu 1500

And finally the frr config for one of those nodes:

!
frr version 8.5.2
frr defaults traditional
hostname zrouter
log syslog informational
no ip forwarding
service integrated-vtysh-config
!
interface lo
 ipv6 router openfabric 1
 openfabric passive
exit
!
interface net10g1b
 ipv6 router openfabric 1
exit
!
router openfabric 1
 net 49.0000.0000.0003.00
exit
!
end

This is what I am getting currently for my topology

root@zrouter:~# vtysh -c "show openfabric topology"
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
zrouter                                                               

IS-IS paths to level-2 routers that speak IPv6
Vertex               Type         Metric Next-Hop             Interface Parent
zrouter                                                               
fc00::83/128         IP6 internal 0                                     zrouter(4)
homer                TE-IS        10                                    zrouter(4)
servarr              TE-IS        20                                    homer(4)
worker               TE-IS        20                                    homer(4)
fc00::80/128         IP6 internal 20                                    homer(4)
fc00::81/128         IP6 internal 30                                    servarr(4)
fc00::82/128         IP6 internal 30                                    worker(4)

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent
homer                TE-IS        0      

These four are actually behaving correctly. I can ping/ssh both ways. Each time the nodes come online, one of the two non-thunderbolt nodes will configure like this, or close, while the other will basically be completely left out. If I then run a systemctl restart frr on the missing node, it will join the mesh, but kick the other off. At this point I have gotten the connections to work across every combination of nodes, just not at the same time, obviously.

I feel like I must be missing something in my frr config but I am racking my brains. I do not know how to find out if there are MTU's are all configured properly but that is definitely one of my points of worry here.

Am I going about this properly? I have been assuming this entire time that what I am attempting here is feasible, but I am starting to feel like I am running out of options. I also tried setting the l0 MTU to 1500, but that did not help the situation.

I would be most grateful for any input anyone can offer here.

@ambrosekwok
Copy link

added info on how to turn extra TB4 debugging messages on https://gist.github.com/scyto/67fdc9a517faefa68f730f82d7fa3570#extra-debugging-for-thunderbolt

i will not support this beyond discussing interesting messages you see as i have minimal clue on troubleshooting grub if you mess up beyond format start again.... be warned....

for example dmesg | "connection manager" will show you if you are using software connection manager (this is required to get the 40gbe connection speed) if you see hardware connection manager, sorry you don't have full TB4 / USB-40 support and will seel less than 20gbe perf.

example:

root@pve2:~# dmesg | grep "connection manager"
[    1.675636] thunderbolt 0000:00:0d.2: using software connection manager
[    1.915459] thunderbolt 0000:00:0d.3: using software connection manager

Hello, thanks a lot for your great guides.

Just tested on GMKtec NucBox K8 Plus Mini PCs (AMD 8845HS). I found some reviews tested the USB4 ports with thunderbolt nvme ssd enclosure which able to achieve 3100MB/s. I thought it would also work on thunderbolt nework.

Unfortunately, I cannot get the desired speed (26Gb/s) but only 11.7Gb/s on thunderbolt network.

Seems many people got similar result (11.7Gb/s) on AMD mini PCs, is the 26Gb/s thunderbolt network only work on Intel CPU devices? What components are missing on those devices?
Would you have advice on choosing proper devices with full thunderbolt network support?

root@pve1:~# cat /sys/bus/thunderbolt/devices/0-2/rx_speed 
20.0 Gb/s
root@pve1:~# dmesg | grep "connection manager"
[    0.827555] thunderbolt 0000:c8:00.5: using software connection manager
[    0.870364] thunderbolt 0000:c8:00.6: using software connection manager
root@pve1:~/# iperf3 -c fc00::82 -bidir
Connecting to host fc00::82, port 5201
[  5] local fc00::81 port 52948 connected to fc00::82 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.37 GBytes  11.7 Gbits/sec    0   2.37 MBytes       
[  5]   1.00-2.00   sec  1.36 GBytes  11.7 Gbits/sec    0   2.37 MBytes       
[  5]   2.00-3.00   sec  1.36 GBytes  11.7 Gbits/sec    0   2.37 MBytes       
[  5]   3.00-4.00   sec  1.36 GBytes  11.7 Gbits/sec    0   2.37 MBytes       
[  5]   4.00-5.00   sec  1.36 GBytes  11.7 Gbits/sec    0   2.37 MBytes       
[  5]   5.00-6.00   sec  1.36 GBytes  11.7 Gbits/sec    0   2.37 MBytes       
[  5]   6.00-7.00   sec  1.36 GBytes  11.7 Gbits/sec    0   2.37 MBytes       
[  5]   7.00-8.00   sec  1.36 GBytes  11.7 Gbits/sec    0   2.37 MBytes       
[  5]   8.00-9.00   sec  1.36 GBytes  11.7 Gbits/sec    0   2.37 MBytes       
[  5]   9.00-10.00  sec  1.36 GBytes  11.7 Gbits/sec    0   2.37 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  13.6 GBytes  11.7 Gbits/sec    0            sender
[  5]   0.00-10.00  sec  13.6 GBytes  11.7 Gbits/sec                  receiver

iperf Done.

@razqqm
Copy link

razqqm commented Feb 5, 2025

Thunderbolt Networking on Proxmox: High CPU Usage When Receiving

Interesting issue using Thunderbolt Networking between two Proxmox nodes where one side (PVE1) has minimal CPU usage while sending or receiving, but the other side (PVE2) experiences extreme CPU load (ksoftirqd at ~99%) when it acts as the receiver.

Setup

  • PVE1

    • Model: MS-01 i9-13900H
    • Proxmox VE Version: 8.3.3
    • Kernel: 6.8.12-8-pve
    • Thunderbolt interface: en05
    • Up to 20 CPU threads
  • PVE2

    • Model: Intel NUC i5-10210U
    • Proxmox VE Version: 8.3.3
    • Kernel: 6.8.12-5-pve
    • Thunderbolt interface: en05
    • 8 CPU threads

A direct Thunderbolt cable is used between PVE1 and PVE2. We assign IPv6 addresses (e.g., fc00::81 and fc00::82) and run tests with iperf3.


Symptom: High ksoftirqd on PVE2 When Receiving

1) PVE1 → PVE2 (PVE2 as the iperf3 server)

root@pve1:~# iperf3 -c fc00::82 -bidir -N
Connecting to host fc00::82, port 5201
[  5] local fc00::81 port 55108 connected to fc00::82 port 5201
[ ID] Interval        Transfer      Bitrate         Retr  Cwnd
[  5]   0.00-1.00  sec  1.23GB   10.5Gbits/sec     741    ...
[  5]   1.00-2.00  sec  1.24GB   10.7Gbits/sec     753    ...
[  5]   2.00-3.00  sec  1.24GB   10.7Gbits/sec     715    ...
...
[  5]   0.00-10.00 sec  12.0GB   10.3Gbits/sec   7286    ...
  • On PVE1 (sender): ksoftirqd ~2% CPU.
  • On PVE2 (receiver): ksoftirqd ~99% (overload), causing packet drops and TCP retries.

2) PVE2 → PVE1 (PVE1 as the iperf3 server)

root@pve2:~# iperf3 -c fc00::81 -bidir -N
Connecting to host fc00::81, port 5201
[  5] local fc00::82 port 60148 connected to fc00::81 port 5201
[ ID] Interval        Transfer      Bitrate         Retr  Cwnd
[  5]   0.00-1.00  sec  1.82GB   15.7Gbits/sec       0    ...
[  5]   1.00-2.00  sec  1.81GB   15.5Gbits/sec       0    ...
...
[  5]   0.00-10.00 sec  18.1GB   15.5Gbits/sec       0    ...
  • On PVE1 (receiver): ksoftirqd ~2% CPU.
  • On PVE2 (sender): ksoftirqd ~40%, still fine. Achieves ~15.5 Gbits/sec.

Essentially:

  • PVE2 (i5-10210U) overloads as an iperf3 server (the receiving side).
  • PVE2 has no issue saturating ~15 Gbits/s as a sender.

Possible Workarounds

1) Limit Throughput via Traffic Control (tc)

tc qdisc del dev en05 root 2>/dev/null
tc qdisc add dev en05 root handle 1: htb default 10 r2q 10000
tc class add dev en05 parent 1: classid 1:1 htb rate 5gbit ceil 5gbit
tc class add dev en05 parent 1:1 classid 1:10 htb rate 5gbit ceil 5gbit
tc qdisc add dev en05 parent 1:10 handle 10: fq_codel

2) CPU Governor: Performance

apt-get install linux-cpupower
cpupower frequency-set -g performance

Conclusion

When PVE2 (i5-10210U) receives large inbound Thunderbolt traffic, it hits 100% in a single ksoftirqd. The same system as a sender can easily push ~15 Gbits/s. This is not a bug per se but reflects the thunderbolt driver’s single RX queue plus limited CPU core performance.

Recommended:

  • Shape or cap throughput to ~5–7 Gbits/s if you need stability.
  • Ensure CPU boost is working (no throttling).
  • Accept the single-core receive bottleneck if you must run high inbound speeds.

By limiting the speed to what PVE2’s single CPU core can comfortably process, you avoid the massive load, retransmissions, and dropped packets.

@ronindesign
Copy link

Thunderbolt Networking on Proxmox: High CPU Usage When Receiving

Interesting issue using Thunderbolt Networking between two Proxmox nodes where one side (PVE1) has minimal CPU usage while sending or receiving, but the other side (PVE2) experiences extreme CPU load (ksoftirqd at ~99%) when it acts as the receiver.

Wow this is such a good write-up, thank you for sharing. Very informative!

We're running clustered i9-13900H, but this is really something good to watch out for.

@rohrsh
Copy link

rohrsh commented Feb 9, 2025

Thanks for sharing all this advice. I used it in a slightly different way: Thunderbolt networking cable to put 10Gb networking on my desktop (MacBook m3). Essentially just jammed the thunderbolt0 into the vmbr0 and voila.

I get full 10Gb iperf3 performance from my MacBook to the Proxmox server (Intel NUC) over Thunderbolt cable.
My Proxmox server also gets full LAN ethernet speeds to my internet router.

However ... when I try to go through my Proxmox thunderbolt-to-ethernet bridge .... I only get 1Mb/sec uploads to from my Macbook to internet/router, but full gigabit downloads. (I've checked this with both speedtest.net and iperf4 on my router.)

So something seems to fall apart when bridging from Thunderbolt to Ethernet LAN, but only in that direction. Figured I would check if any experts here have suggestions. Thank you.

@corvy
Copy link

corvy commented Feb 12, 2025

I just have to raise a shout-out to @nickglott for his excellent guide above.

I was first struggling with frr stopping when I rebooted one node, this was because I tried to use dependency for frr to en06 and en06. When one of these became disconnected (because another host went down or cable unplugged) then frr would stop. No dice.

Solution: Move the frr restart to /etc/network/interfaces.d/thunderbolt like this:

allow-hotplug en05
iface en05 inet manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

post-up /usr/bin/systemctl restart frr.service 

Then I sollowed @nickglott guide above for frr settings. One thing I missed from @scyto original gist was this part. I had used the same net address on each host. That failed for sure. Not sure if putting IPs in frr is needed, but at least it really works. My thinking is that it was the net ID problem that tripped my setup.

For my /etc/frr/frr.conf
*Change X in Hostname, IP, and NET Address for each node to the following
** TheCore-01 | 10.0.10.10/32 | fc00::10/128 | 49.0001.1111.1111.1111.00 **
** TheCore-02 | 10.0.10.20/32 | fc00::20/128 | 49.0001.2222.2222.2222.00 **
** TheCore-03 | 10.0.10.30/32 | fc00::30/128 | 49.0001.3333.3333.3333.00 **

After these changes my setup works perfect, even when one node goes down, and comes back up. And it works on ipv4 and ipv6. :)

@corvy
Copy link

corvy commented Feb 13, 2025

Hi @scyto and everybody!

I had this issue when rebooting any proxmox node that thunderbolt network interface didn't come up. Finally I figured it out and maybe somebody can profit too.

/usr/local/bin/pve-en05.sh

#!/bin/bash

for i in {1..10}; do
    /usr/sbin/ifup en05 && break
    sleep 3
done

/usr/local/bin/pve-en06.sh

#!/bin/bash

for i in {1..10}; do
    /usr/sbin/ifup en06 && break
    sleep 3
done

When triggered by udev, it runs appropriate script, if "ifup" succeeds (exit code 0), the "break" will stop the loop. If exit code is not 0, it will sleep 3 seconds and do it again (up to 10 times in this case, ultimately lasting for 30s). You can adjust sleep time or number of tries in loop.

FYI @ronindesign @ctroyp

I also had the exact same problem as @ronindesign . Found the following error in my logs: "error: Another instance of this program is already running." when I debug the udev script. This error indicates that ifup is running two times but is allowed only one instance, I guess because it is the boot process. At least a quick fix for me was to change like this:

ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-net.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-net.sh"

And the script:

#!/bin/bash
LOGFILE="/tmp/udev-debug.log"
echo "$(date): pve-net.sh triggered by udev" >> $LOGFILE

#Bring up both en05 and en06 
/usr/sbin/ifup -v en05 en06 >> $LOGFILE 2>&1

This way I also get a log in /tmp at every boot to see that the IFs got up well. Which is the best solution (above or mine) I am open to suggestions. But at least this works well. Before this change, at every boot, en05 came up, en06 did not. This caused strange issues since sometimes it worked (proxmox happy, ceph happy), but at some point when you boot another node you could hit the downed interface and ceph not happy anymore :P

Check in Proxmox that all IFs are up after boot to see if you are exposed to this problem.

image

Both en05 and en06 should be Active: Yes on all nodes. If they are not (after boot especially) then have a look at this solution.

For completeness I can mention that I moved IP away from frr and back to /etc/network/interfaces.d/thunderbolt. If you set IP in frr I find that it fails when I restart networking (if you need to do that from time to time).

@corvy
Copy link

corvy commented Feb 14, 2025

I reverted and improved upon @ronindesign solution. I liked the ability to get a fresh log in /tmp after boot (its deleted every boot, I do not need history). It seems that some few cases when the server is booting udev detects en05 before en06, and it runs ifup so quickly that en06 is not ready to "go up". This has happened once every 10-12 boots (I have been rigorously testing) so I needed to adapt. This is my current settings which works very well.

#/etc/udev/rules.d/10-tb-en.rules

ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-net.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-net.sh"

#/usr/local/bin/pve-en0[5-6].sh

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en05"
echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }

    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

Remember to change IF= to the desired interface, and also make 2 distinct files like described in the gist.

Example output in the log after a normal boot:

Fri Feb 14 13:47:01 CET 2025: pve-en06.sh triggered by udev
Fri Feb 14 13:47:01 CET 2025: pve-en05.sh triggered by udev
Fri Feb 14 13:47:01 CET 2025: Attempt 1 to bring up en06
Fri Feb 14 13:47:01 CET 2025: Attempt 1 to bring up en05
error: Another instance of this program is already running.
Fri Feb 14 13:47:01 CET 2025: Attempt 1 failed, retrying in 3 seconds...
Fri Feb 14 13:47:01 CET 2025: Successfully brought up en06 on attempt 1
Fri Feb 14 13:47:04 CET 2025: Attempt 2 to bring up en05
Fri Feb 14 13:47:04 CET 2025: Successfully brought up en05 on attempt 2

@taslabs-net
Copy link

taslabs-net commented Feb 28, 2025

I reverted and improved upon @ronindesign solution. I liked the ability to get a fresh log in /tmp after boot (its deleted every boot, I do not need history). It seems that some few cases when the server is booting udev detects en05 before en06, and it runs ifup so quickly that en06 is not ready to "go up". This has happened once every 10-12 boots (I have been rigorously testing) so I needed to adapt. This is my current settings which works very well.

#/etc/udev/rules.d/10-tb-en.rules

ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-net.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-net.sh"

#/usr/local/bin/pve-en0[5-6].sh

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en05"
echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }

    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

Remember to change IF= to the desired interface, and also make 2 distinct files like described in the gist.

Example output in the log after a normal boot:

Fri Feb 14 13:47:01 CET 2025: pve-en06.sh triggered by udev
Fri Feb 14 13:47:01 CET 2025: pve-en05.sh triggered by udev
Fri Feb 14 13:47:01 CET 2025: Attempt 1 to bring up en06
Fri Feb 14 13:47:01 CET 2025: Attempt 1 to bring up en05
error: Another instance of this program is already running.
Fri Feb 14 13:47:01 CET 2025: Attempt 1 failed, retrying in 3 seconds...
Fri Feb 14 13:47:01 CET 2025: Successfully brought up en06 on attempt 1
Fri Feb 14 13:47:04 CET 2025: Attempt 2 to bring up en05
Fri Feb 14 13:47:04 CET 2025: Successfully brought up en05 on attempt 2

I split the functions up with affinity. It’s really stable for me now. My screenshot on ceph page. Thank you for this.

@geosp
Copy link

geosp commented Mar 21, 2025

This is excellent. Thank you for the clear explanation! I've documented my homelab Thunderbolt networking experiments based on several sources including this document. If anyone wants to see a thorough compilation of the available resources and real-world performance testing results, I've created a detailed guide here: https://gist.github.com/geosp/80fbd39e617b7d1d9421683df4ea224a . It includes advanced iperf3 testing showing how packet size affects throughput and stability on AMD hardware.

@VACIndustries
Copy link

I wonder if anyone has got this working with an ASM4242 (built into an ASUS AMD motherboard). It's setup with 2 USB4 ports but it seems like only one device is available to get up and running as a network device

➜  ~ boltctl list --all

 ● ASMedia Technology Inc. ASM4242
   ├─ type:          host
   ├─ name:          ASM4242
   ├─ vendor:        ASMedia Technology Inc.
   ├─ uuid:          b1b84c17-007f-44c6-ffff-ffffffffffff
   ├─ generation:    USB4
   ├─ status:        authorized
   │  ├─ domain:     b1b84c17-007f-44c6-ffff-ffffffffffff
   │  └─ authflags:  none
   ├─ authorized:    Fri 21 Mar 2025 06:51:14 PM UTC
   ├─ connected:     Fri 21 Mar 2025 06:51:14 PM UTC
   └─ stored:        no

➜  ~ bult in
➜  ~ boltctl info b1b84c17-007f-44c6-ffff-ffffffffffff
 ● ASMedia Technology Inc. ASM4242
   ├─ type:          host
   ├─ name:          ASM4242
   ├─ vendor:        ASMedia Technology Inc.
   ├─ uuid:          b1b84c17-007f-44c6-ffff-ffffffffffff
   ├─ dbus path:     /org/freedesktop/bolt/devices/b1b84c17_007f_44c6_ffff_ffffffffffff
   ├─ generation:    USB4
   ├─ status:        authorized
   │  ├─ domain:     b1b84c17-007f-44c6-ffff-ffffffffffff
   │  ├─ parent:     (null)
   │  ├─ syspath:    /sys/devices/pci0000:00/0000:00:02.2/0000:12:00.0/0000:13:03.0/0000:79:00.0/domain0/0-0
   │  └─ authflags:  none
   ├─ authorized:    Fri 21 Mar 2025 06:51:14 PM UTC
   ├─ connected:     Fri 21 Mar 2025 06:51:14 PM UTC
   └─ stored:        no

This is what I'm seeing in the device directory structure:

➜  ~ ls -al /sys/bus/pci/devices/0000:79:00.0/domain0/0-0/
0-1/              authorized        device_name       nvm_active0/      nvm_non_active0/  power/            uevent            usb4_port1/       vendor            wakeup/
0-3/              device            generation        nvm_authenticate  nvm_version       subsystem@        unique_id         usb4_port3/       vendor_name

➜  ~ ls -al /sys/bus/pci/devices/0000:79:00.0/domain0/0-0/0-1/
0-1.0/       device       device_name  maxhopid     power/       rx_lanes     rx_speed     subsystem@   tx_lanes     tx_speed     uevent       unique_id    vendor       vendor_name

➜  ~ ls -al /sys/bus/pci/devices/0000:79:00.0/domain0/0-0/0-1/0-1.0/
driver@     key         modalias    net/        power/      prtcid      prtcrevs    prtcstns    prtcvers    subsystem@  uevent

➜  ~ ls -al /sys/bus/pci/devices/0000:79:00.0/domain0/0-0/0-3/
0-3.0/       device       device_name  maxhopid     power/       rx_lanes     rx_speed     subsystem@   tx_lanes     tx_speed     uevent       unique_id    vendor       vendor_name

➜  ~ ls -al /sys/bus/pci/devices/0000:79:00.0/domain0/0-0/0-3/0-3.0/
driver@     key         modalias    net/        power/      prtcid      prtcrevs    prtcstns    prtcvers    subsystem@  uevent

➜  ~ ls -al /sys/bus/pci/devices/0000:79:00.0/domain0/0-0/usb4_port1/
link    power/  uevent

➜  ~ ls -al /sys/bus/pci/devices/0000:79:00.0/domain0/0-0/usb4_port3/
link    power/  uevent

I've tried almost every path (though I haven't restarted every time to see if the network device is up via ifconfig; on ran /usr/sbin/ifup against them as I tried each one):
Path=pci-0000:79:00.0/domain0/0-0/0-1 & Path=pci-0000:79:00.0/domain0/0-0/0-3
Path=pci-0000:79:00.0/domain0/0-0/0-1/0-1.0 & Path=pci-0000:79:00.0/domain0/0-0/0-3/0-3.0
Path=pci-0000:79:00.0/domain0/0-0/usb4_port1 & Path=pci-0000:79:00.0/domain0/0-0/usb4_port3

The only one I've been able to get running setting both as 0000:79:00.0 BUT even then; only one of them comes up (I presume because they can't both use the same address):

➜  ~ cat /etc/systemd/network/00-thunderbolt*
[Match]
Path=pci-0000:79:00.0
Driver=thunderbolt-net

[Link]
MACAddressPolicy=none
Name=en05
[Match]
Path=pci-0000:79:00.0
Driver=thunderbolt-net

[Link]
MACAddressPolicy=none
Name=en06

➜  ~ ifconfig -a | grep en0
en05: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

The only thunderbolt/USB4 device I see under PCI is that address. Ironically the first...and possibly the last time I use an AMD product (though not directly their fault clearly)

@IndianaJoe1216
Copy link

I reverted and improved upon @ronindesign solution. I liked the ability to get a fresh log in /tmp after boot (its deleted every boot, I do not need history). It seems that some few cases when the server is booting udev detects en05 before en06, and it runs ifup so quickly that en06 is not ready to "go up". This has happened once every 10-12 boots (I have been rigorously testing) so I needed to adapt. This is my current settings which works very well.

#/etc/udev/rules.d/10-tb-en.rules

ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-net.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-net.sh"

#/usr/local/bin/pve-en0[5-6].sh

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en05"
echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }

    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

Remember to change IF= to the desired interface, and also make 2 distinct files like described in the gist.

Example output in the log after a normal boot:

Fri Feb 14 13:47:01 CET 2025: pve-en06.sh triggered by udev
Fri Feb 14 13:47:01 CET 2025: pve-en05.sh triggered by udev
Fri Feb 14 13:47:01 CET 2025: Attempt 1 to bring up en06
Fri Feb 14 13:47:01 CET 2025: Attempt 1 to bring up en05
error: Another instance of this program is already running.
Fri Feb 14 13:47:01 CET 2025: Attempt 1 failed, retrying in 3 seconds...
Fri Feb 14 13:47:01 CET 2025: Successfully brought up en06 on attempt 1
Fri Feb 14 13:47:04 CET 2025: Attempt 2 to bring up en05
Fri Feb 14 13:47:04 CET 2025: Successfully brought up en05 on attempt 2

@corvy Thanks for this! Working great! Unfortunately only on 2 of my nodes. For some reason on my 1st node it doesn't actually execute the en05/06 scripts. The only reason I know is because no log is generated at boot. If I execute them manually the log is generated and everything seems to work fine, it just won't run at boot.. Have you encountered this before?

@corvy
Copy link

corvy commented Mar 28, 2025

Hello @IndianaJoe1216 , no I have not had this issue. Make sure your permissions and files are all correct set up.

OBS! Your 10-tb-en.rules looks wrong to me. They do not reference the correct file names, should be 06 and 06.

My settings now:

/etc/udev/rules.d/10-tb-en.rules

ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-n05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-n06.sh"

/usr/local/bin/pve-n05.sh

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en05"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

/usr/local/bin/pve-n06.sh

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en06"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

Remember: chmod +x /usr/local/bin/*.sh
initramfs pub: update-initramfs -u -k all

This: /etc/systemd/system/frr.service.d/dependencies.conf

[Unit]
Wants=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This makes sure that if one go down the other stays up - not sure it is needed.

Double-check everything - again and again :)

@lettucebuns
Copy link

has anyone upgraded their proxmox hosts this week? it looks like FRR is going from version 8.5.2 to 10.2.1. just wondering if anyone has run into any issues.

@nickglott
Copy link

nickglott commented Apr 3, 2025

has anyone upgraded their proxmox hosts this week? it looks like FRR is going from version 8.5.2 to 10.2.1. just wondering if anyone has run into any issues.

Been absent for a while, my setup has been rock soild for months!, I did just do the updates, have not noticed any issues still getting 26G Link between nodes.

Durring the update I did option N to not re-write the ffr config. I belive it is only replacing the default frr daemons file (nano /etc/frr/daemons) setting fabricd=yes back to to fabricd=no. If you overwrite the file it should be a quick fix.

@corvy
Copy link

corvy commented Apr 3, 2025

Same experience here. Updated this morning, N to overwrite and all works swell.

@scyto
Copy link
Author

scyto commented Apr 14, 2025

Same experience here. Updated this morning, N to overwrite and all works swell.

same here so long as N or D is selected there should be no issue
next up is me moving to later kernel, i know some have had issues with that.... if those are still happening I am keen to get them logged with proxmox folks as regressions

@JahMark420
Copy link

JahMark420 commented Apr 19, 2025

Seems I have this working now, but running into a strange issue - I have 2 MS-01, one is a 13900H and the other is a 12600H
When I have 13900H (10.0.0.81) set up as the server running iperf3, I get lower speeds and a lot of retries:

Connecting to host 10.0.0.81, port 5201
[ 5] local 10.0.0.82 port 34714 connected to 10.0.0.81 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.06 GBytes 17.7 Gbits/sec 595 1.75 MBytes
[ 5] 1.00-2.00 sec 1.11 GBytes 9.56 Gbits/sec 324 1.69 MBytes
[ 5] 2.00-3.00 sec 1.92 GBytes 16.5 Gbits/sec 692 1.56 MBytes
[ 5] 3.00-4.00 sec 1.50 GBytes 12.9 Gbits/sec 499 1.50 MBytes
[ 5] 4.00-5.00 sec 1.54 GBytes 13.3 Gbits/sec 475 1.50 MBytes
[ 5] 5.00-6.00 sec 1001 MBytes 8.40 Gbits/sec 299 1.44 MBytes
[ 5] 6.00-7.00 sec 2.04 GBytes 17.5 Gbits/sec 679 1.62 MBytes
[ 5] 7.00-8.00 sec 2.18 GBytes 18.7 Gbits/sec 781 1.81 MBytes
[ 5] 8.00-9.00 sec 1.87 GBytes 16.1 Gbits/sec 604 1.37 MBytes
[ 5] 9.00-10.00 sec 2.05 GBytes 17.6 Gbits/sec 678 2.31 MBytes


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 17.2 GBytes 14.8 Gbits/sec 5626 sender
[ 5] 0.00-10.00 sec 17.2 GBytes 14.8 Gbits/sec receiver

When I set 12600H (10.0.0.82) set up as server I get the correct speeds:

Connecting to host 10.0.0.82, port 5201
[ 5] local 10.0.0.81 port 49150 connected to 10.0.0.82 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 3.07 GBytes 26.4 Gbits/sec 35 3.43 MBytes
[ 5] 1.00-2.00 sec 3.07 GBytes 26.4 Gbits/sec 1 3.43 MBytes
[ 5] 2.00-3.00 sec 2.99 GBytes 25.7 Gbits/sec 5 3.50 MBytes
[ 5] 3.00-4.00 sec 3.05 GBytes 26.2 Gbits/sec 5 3.50 MBytes
[ 5] 4.00-5.00 sec 3.06 GBytes 26.3 Gbits/sec 0 3.50 MBytes
[ 5] 5.00-6.00 sec 3.08 GBytes 26.4 Gbits/sec 2 3.50 MBytes
[ 5] 6.00-7.00 sec 3.07 GBytes 26.3 Gbits/sec 0 3.50 MBytes
[ 5] 7.00-8.00 sec 3.06 GBytes 26.3 Gbits/sec 2 3.50 MBytes
[ 5] 8.00-9.00 sec 3.08 GBytes 26.5 Gbits/sec 2 3.50 MBytes
[ 5] 9.00-10.00 sec 3.05 GBytes 26.2 Gbits/sec 0 3.50 MBytes


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 30.6 GBytes 26.3 Gbits/sec 52 sender
[ 5] 0.00-10.00 sec 30.6 GBytes 26.3 Gbits/sec receiver

I also get similar results with ipv6.

I've tried updating the grub file,
ran update-initramfs -u -k all and did a reboot
Other than the CPU, both systems are identical, I believe they both have the same USB4 as well.

Both are running the same kernel
Kernel Version Linux 6.8.12-9-pve (2025-03-16T19:18Z)

I thought it was a kernel issue, but once I ran the update-initramfs -u -k all on them, speeds from one side worked as expected.

Has anyone run into this issue? Or am I over my headand I'm limited to what the 12600H can handle? I suspect not, since they have the same USB4

Im very much a noob when it comes to linux but have been finding my way around
Any help is appreciated :)

@nickglott
Copy link

nickglott commented Apr 19, 2025

@JahMark420 Make sure you have IOMMU on each node, that tends to be the biggest issue with speed. The other is pinning TB to only P-cores.

make sure the following is set on the kernel command line in /etc/default/grub file intel_iommu=on iommu=pt one set be sure to run update-grub and reboot

everyones grub command line is different this is mine because i also have i915 virtualization, if you get this wrong you can break your machine, if you are not doing that you don't need the i915 entries you see below

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" (note if you have more things in your cmd line DO NOT REMOVE them, just add the two intel ones, doesnt matter where.

I am using this below, it is acctivated every ifup. /etc/network/if-up.d/thunderbolt-affinity (changing 0-7) for what your P-cores are. There is also another meathod listed above somewhere in the comments in one of these gists @Allistah is the one that posted ther other method that I don't think you need to define the P-cores

#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-7 | tee "/proc/irq/{}/smp_affinity_list"'
fi

@JahMark420
Copy link

@JahMark420 Make sure you have IOMMU on each node, that tends to be the biest issue with speed. The other is pinning TB to only P-cores.

make sure the following is set on the kernel command line in /etc/default/grub file intel_iommu=on iommu=pt one set be sure to run update-grub and reboot

everyones grub command line is different this is mine because i also have i915 virtualization, if you get this wrong you can break your machine, if you are not doing that you don't need the i915 entries you see below

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" (note if you have more things in your cmd line DO NOT REMOVE them, just add the two intel ones, doesnt matter where.

I am using this below, it is acctivated every ifup. /etc/network/if-up.d/thunderbolt-affinity (changing 0-7) for what your P-cores are. There is also another meathod listed above somewhere in the comments in one of these gists @Allistah is the one that posted ther other method that I don't think you need to define the P-cores

#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-7 | tee "/proc/irq/{}/smp_affinity_list"'
fi

Thanks @nickglott - Yes both nodes have the IOMMU - I followed that part and did the update-grub and rebooted

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=lsb_release -i -s 2> /dev/null || echo Debian
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
#GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""

I didn't think id need to define the P Cores since it was working one way but you could be right.

The grub looks correct or am I missing something?

Thanks for the swift reply :)

@nickglott
Copy link

@JahMark420 The other method by @Allistah or @contributorr mentioned here....it has been awhile and I can't remmeber sorry.

https://gist.github.com/scyto/4c664734535da122f4ab2951b22b2085?permalink_comment_id=5248976#gistcomment-5248976

add this to /etc/rc.local

#!/bin/bash
for id in $(grep 'thunderbolt' /proc/interrupts | awk '{print $1}' | cut -d ':' -f1); do
    echo 0f > /proc/irq/$id/smp_affinity
done

@nickglott
Copy link

For some raeson some people need it and other don't I know most of us with MS-01's need it, Scyto's nuc's dont prob something to do with bios or firmware or the way TB is implamented.

I didn't think id need to define the P Cores since it was working one way but you could be right.

The grub looks correct or am I missing something?

Thanks for the swift reply :)

@JahMark420
Copy link

For some raeson some people need it and other don't I know most of us with MS-01's need it, Scyto's nuc's dont prob something to do with bios or firmware or the way TB is implamented.

I didn't think id need to define the P Cores since it was working one way but you could be right.
The grub looks correct or am I missing something?
Thanks for the swift reply :)

Thanks @nickglott seems specifying the P cores did the trick. Seeing now about 26Gbps across both nodes.

Appreciate the guidance

@scyto
Copy link
Author

scyto commented Apr 21, 2025

@nickglott @JahMark420 does the affinity script need running every time the interfaces come up or is running in /etc/rc.local good enough?

(i don't have this issue so can't verify)

I hope the latter, just added to the main gist above along with other changes in both this and the openfrabric IP side of things too

@corvy
Copy link

corvy commented Apr 21, 2025

I do not run this in rc.local. I have it like this:

root@px0# cat /etc/network/if-up.d/thunderbolt-affinity 
#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-11 | tee "/proc/irq/{}/smp_affinity_list"'
fi

This ensure it gets set everytime the if en05 or en06 goes up or down. Including cable connect / disconnect. I prefer this over rc.local. Should the device change IRQ then the rc.local approach will fail. Not sure who suggested this approach, maybe it was @nickglott but I cannot remember. At least doing it this way is very robust and would be my suggestion.

@scyto
Copy link
Author

scyto commented Apr 21, 2025

This ensure it gets set everytime the if en05 or en06 goes up or down. Including cable connect / disconnect. I prefer this over rc.local. Should the device change IRQ then the rc.local approach will fail. Not sure who suggested this approach, maybe it was @nickglott but I cannot remember. At least doing it this way is very robust and would be my suggestion.

Thanks, i was also contemplating telling folks to add it to the user crontab using the crontab -e command with at @daily but if this needs to be done each time the driver is loaded thats also a bust too. I agree you way looks robust - which i think is key.

its also wild to me i just don't get the issue... this is between two of my two nodes, i have never set affinity, i would love to understand why the difference occurs....

Connecting to host fc00::81, port 5201
[  5] local fc00::82 port 38314 connected to fc00::81 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  3.05 GBytes  26.2 Gbits/sec   28   3.06 MBytes       
[  5]   1.00-2.00   sec  3.12 GBytes  26.8 Gbits/sec    3   2.81 MBytes       
[  5]   2.00-3.00   sec  3.09 GBytes  26.6 Gbits/sec   31   3.87 MBytes       
[  5]   3.00-4.00   sec  3.12 GBytes  26.8 Gbits/sec    0   3.87 MBytes       
[  5]   4.00-5.00   sec  3.12 GBytes  26.8 Gbits/sec    8   2.81 MBytes       
[  5]   5.00-6.00   sec  3.10 GBytes  26.7 Gbits/sec    1   3.81 MBytes       
[  5]   6.00-7.00   sec  3.11 GBytes  26.7 Gbits/sec    0   3.81 MBytes       
[  5]   7.00-8.00   sec  3.11 GBytes  26.7 Gbits/sec    0   3.81 MBytes       
[  5]   8.00-9.00   sec  3.09 GBytes  26.6 Gbits/sec    0   3.81 MBytes       
[  5]   9.00-10.00  sec  3.10 GBytes  26.6 Gbits/sec    1   3.81 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  31.0 GBytes  26.6 Gbits/sec   72             sender
[  5]   0.00-10.00  sec  31.0 GBytes  26.6 Gbits/sec                  receiver

out of interest what is you smp_affinity setting, is it ffff?

root@pve2:~# cat /proc/irq/129/smp_affinity
ffff
root@pve2:~# cat /proc/irq/129/smp_affinity_list
0-15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment