Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active November 16, 2024 12:16
Show Gist options
  • Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Thunderbolt Networking Setup

Thunderbolt Networking

this gist is part of this series

NOTE FOR THIS TO BE RELIABLE ON NODE RESTARTS YOU WILL NEED PROXMOX KERNEL 6.2.16-14-pve OR HIGER

This fixes issues i bugged with the thunderbolt / thunderbolt-net maintainers (i will take everyones thanks now, lol)

Install LLDP - this is great to see what nodes can see which.

  • install lldpctl with apt install lldpd

Load Kernel Modules

  • add thunderbolt and thunderbolt-net kernel modules (this must be done all nodes - yes i know it can sometimes work withoutm but the thuderbolt-net one has interesting behaviou' so do as i say - add both ;-)
    1. nano /etc/modules add modules at bottom of file, one on each line
    2. save using x then y then enter

Prepare /etc/network/interfaces

doing this means we don't have to give each thunderbolt a manual IPv6 addrees and that these addresses stay constant no matter what Add the following to each node using nano /etc/network/interfaces

If you see any sections called thunderbolt0 or thunderbol1 delete them at this point.

Create entries to prepopulate gui with reminder

Doing this means we don't have to give each thunderbolt a manual IPv6 or IPv4 addrees and that these addresses stay constant no matter what.

Add the following to each node using nano /etc/network/interfaces this to remind you not to edit en05 and en06 in the GUI

This fragment should go between the existing auto lo section and adapater sections.

iface en05 inet manual
#do not edit it GUI

iface en06 inet manual
#do not edit in GUI

If you see any thunderbol sections delete them from the file before you save it.

*DO NOT DELETE the source /etc/network/interfaces.d/* this will always exist on the latest versions and should be the last or next to last line in /interfaces file

Rename Thunderbolt Connections

This is needed as proxmox doesn't recognize the thunderbolt interface name. There are various methods to do this. This method was selected after trial and error because:

  • the thunderboltX naming is not fixed to a port (it seems to be based on sequence you plug the cables in)
  • the MAC address of the interfaces changes with most cable insertion and removale events
  1. use udevadm monitor command to find your device IDs when you insert and remove each TB4 cable. Yes you can use other ways to do this, i recommend this one as it is great way to understand what udev does - the command proved more useful to me than the syslog or lspci command for troublehsooting thunderbolt issues and behavious. In my case my two pci paths are 0000:00:0d.2and 0000:00:0d.3 if you bought the same hardware this will be the same on all 3 units. Don't assume your PCI device paths will be the same as mine.

  2. create a link file using nano /etc/systemd/network/00-thunderbolt0.link and enter the following content:

[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05
  1. create a second link file using nano /etc/systemd/network/00-thunderbolt1.link and enter the following content:
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

Set Interfaces to UP on reboots and cable insertions

This section en sure that the interfaces will be brought up at boot or cable insertion with whatever settings are in /etc/network/interfaces - this shouldn't need to be done, it seems like a bug in the way thunderbolt networking is handled (i assume this is debian wide but haven't checked).

  1. create a udev rule to detect for cable insertion using nano /etc/udev/rules.d/10-tb-en.rules with the following content:
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"
  1. save the file

  2. create the first script referenced above using nano /usr/local/bin/pve-en05.sh and with the follwing content:

#!/bin/bash

# this brings the renamed interface up and reprocesses any settings in /etc/network/interfaces for the renamed interface
/usr/sbin/ifup en05

save the file and then

  1. create the second script referenced above using nano /usr/local/bin/pve-en06.sh and with the follwing content:
#!/bin/bash

# this brings the renamed interface up and reprocesses any settings in /etc/network/interfaces for the renamed interface
/usr/sbin/ifup en06

and save the file

  1. make both scripts executable with chmod +x /usr/local/bin/*.sh
  2. run update-initramfs -u -k all to propogate the new link files into initramfs
  3. Reboot (restarting networking, init 1 and init 3 are not good enough, so reboot)

Enabling IP Connectivity

proceed to the next gist

Slow thunderbolt perf

if you are having speed issues make sure the following is set on the kernel command line in /etc/default/grub file intel_iommu=on iommu=pt one set be sure to run update-grub and reboot

everyones grub command line is different this is mine because i also have i915 virtualization, if you get this wrong you can break your machine, if you are not doing that you don't need the i915 entries you see below

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" (note if you have more things in your cmd line DO NOT REMOVE them, just add the two intel ones, doesnt matter where.

Extra Debugging for Thunderbolt

dynamic kernel tracing - adds more info to dmesg, doesn't overhwelm dmesg

I have only tried this on 6.8 kernels, so YMMV If you want more TB messages in dmesg to see why connection might be failing here is how to turn on dynamic tracing

For bootime you will need to add it to the kernel command line by adding thunderbolt.dyndbg=+p to your /etc/default/grub file, running update-grub and rebooting.

To expand the example above"

`GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt thunderbolt.dyndbg=+p"`  

Don't forget to run update-grub after saving the change to the grub file.

For runtime debug you can run the following command (it will revert on next boot) so this cant be used to cpature what happens at boot time.

`echo -n 'module thunderbolt =p' > /sys/kernel/debug/dynamic_debug/control`

install tbtools

these tools can be used to inspect your thundebolt system, note they rely on rust to be installedm you must use the rustup script below and not intsall rust by package manager at this time (9/15/24)

apt install pkg-config libudev-dev git curl
curl https://sh.rustup.rs -sSf | sh
git clone https://github.com/intel/tbtools
restart you ssh session
cd tbtools
cargo install --path .
@fesnault
Copy link

fesnault commented Oct 17, 2024

Hi guys
I'm in the same situation as some others.
I have two minisforum ms-01 (no problem there - even i didn't finish), and ont IT13.
This last one has two usb4 ports, but the pci id discovering with udevadm monitor gives me the same pci address.
Here is what happens when i unplug/plug each port.

Port 1
unplug
KERNEL[4657.803630] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_non_active1 (nvmem)
KERNEL[4657.803645] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_active1 (nvmem)
KERNEL[4657.803654] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1 (thunderbolt)
KERNEL[4657.803812] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1/queues/rx-0 (queues)
KERNEL[4657.803825] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1/queues/tx-0 (queues)
KERNEL[4657.803832] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1 (net)
UDEV  [4657.806173] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_non_active1 (nvmem)
UDEV  [4657.806325] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1/queues/rx-0 (queues)
UDEV  [4657.806338] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_active1 (nvmem)
UDEV  [4657.806504] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1/queues/tx-0 (queues)
UDEV  [4657.806518] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1 (thunderbolt)
UDEV  [4657.807874] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1 (net)
KERNEL[4657.817169] unbind   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
KERNEL[4657.817183] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV  [4657.817570] unbind   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV  [4657.817794] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
KERNEL[4657.820087] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)
UDEV  [4657.820174] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)

plug
KERNEL[4694.615821] change   /1-1 (thunderbolt)
UDEV  [4694.618136] change   /1-1 (thunderbolt)
KERNEL[4695.638739] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)
KERNEL[4695.638757] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
KERNEL[4695.638762] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1 (net)
KERNEL[4695.638765] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1/queues/rx-0 (queues)
KERNEL[4695.638767] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1/queues/tx-0 (queues)
KERNEL[4695.638949] bind     /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV  [4695.639414] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)
UDEV  [4695.639836] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV  [4695.644713] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1 (net)
UDEV  [4695.644767] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1/queues/rx-0 (queues)
UDEV  [4695.645294] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt1/queues/tx-0 (queues)
UDEV  [4695.645678] bind     /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)

Port 2
unplug
KERNEL[4718.217021] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1/nvm_non_active0 (nvmem)
KERNEL[4718.217048] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1/nvm_active0 (nvmem)
KERNEL[4718.217053] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1 (thunderbolt)
KERNEL[4718.217056] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0/queues/rx-0 (queues)
KERNEL[4718.217058] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0/queues/tx-0 (queues)
KERNEL[4718.217062] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0 (net)
UDEV  [4718.219696] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1/nvm_non_active0 (nvmem)
UDEV  [4718.219710] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1/nvm_active0 (nvmem)
UDEV  [4718.219715] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1 (thunderbolt)
UDEV  [4718.219720] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0/queues/rx-0 (queues)
UDEV  [4718.219860] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0/queues/tx-0 (queues)
UDEV  [4718.221146] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0 (net)
KERNEL[4718.243146] unbind   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0 (thunderbolt)
KERNEL[4718.243153] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0 (thunderbolt)
UDEV  [4718.243360] unbind   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0 (thunderbolt)
UDEV  [4718.243439] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0 (thunderbolt)
KERNEL[4718.245925] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3 (thunderbolt)
UDEV  [4718.246093] remove   /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3 (thunderbolt)

plug
KERNEL[4748.852672] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1 (thunderbolt)
UDEV  [4748.855508] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1 (thunderbolt)
KERNEL[4748.866984] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1/nvm_active0 (nvmem)
KERNEL[4748.866992] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1/nvm_non_active0 (nvmem)
UDEV  [4748.867397] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1/nvm_active0 (nvmem)
UDEV  [4748.867723] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port3/1-0:3.1/nvm_non_active0 (nvmem)
KERNEL[4753.171400] change   /1-3 (thunderbolt)
UDEV  [4753.173853] change   /1-3 (thunderbolt)
KERNEL[4754.197134] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3 (thunderbolt)
KERNEL[4754.197151] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0 (thunderbolt)
KERNEL[4754.197156] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0 (net)
KERNEL[4754.197159] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0/queues/rx-0 (queues)
KERNEL[4754.197162] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0/queues/tx-0 (queues)
KERNEL[4754.197360] bind     /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0 (thunderbolt)
UDEV  [4754.197739] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3 (thunderbolt)
UDEV  [4754.198310] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0 (thunderbolt)
UDEV  [4754.203744] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0 (net)
UDEV  [4754.203950] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0/queues/rx-0 (queues)
UDEV  [4754.204632] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0/net/thunderbolt0/queues/tx-0 (queues)
UDEV  [4754.205057] bind     /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0 (thunderbolt)

But lspci shows me two pci devices :

00:0d.2 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI
00:0d.3 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI

When i tried to create the links, update and reboot, only one port was renamed and showed up. The other one is still shown as tunderbolt.

So i need to know how to match each of the ports in the Match section of the link files, but I cannot find.

I also tried with udevadm, creating rules instead of links, which allows me to search on ATTR, but it did not do anything. So i'd like to make the link process work.

Did anyone managed to have it working with this kind of setup ?
Basically ports seem to be on these devices :

/devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0
/devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3/1-3.0

I'd be happy to know how to match them in the link files...

And btw, it seems all nodes can see each other.
I have a script using udevadm info on each device on each node, they wirte the output in logs, one per port, and i make a diff of them. They show that :

Node ms01-01

<     ATTR{device_name}=="ms01-02"
---
>     ATTR{device_name}=="gt13"

Node ms01-02

<     ATTR{device_name}=="gt13"
---
>     ATTR{device_name}=="ms01-01"

Node gt13 (yes it's an it13 but i made a mistake when i gave it its name :) ) :

<     ATTR{device_name}=="ms01-01"
---
>     ATTR{device_name}=="ms01-02"

And on the MS devices, one port is up, the other is down. Which seems legit, as one is connected to the other ms device. The down port is connected to the it13.

So my guess is hardware is fine, cables are fine, but I can't rename both ports on it13. Only one, which I guess is the one with the 0000:00:0d.3 pci id.

I'd be really grateful for your help !

@fesnault
Copy link

fesnault commented Oct 18, 2024

Ok I managed to get it working, but with a little hack.

First, I had one iface renamed and up, the other one kept the thunderbolt0 name, even though I have the two link rules on.
So i added a script, called at boot by a service, that renames and activates the second interfrace.

Here are my links, that try to rename the ifaces even though the pci path is the same. I go to the first difference in path to do this.

00-thunderbolt0.link

[Match]
DEVPATH=/devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1
Driver=thunderbolt-net

[Link]
MACAddressPolicy=none
Name=en05

00-thunderbolt1.link

[Match]
DEVPATH=/devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-3
Driver=thunderbolt-net

[Link]
MACAddressPolicy=none
Name=en06

Here is the service :

[Unit]
Description=Rename thunderbolt interfaces
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
ExecStart=/root/scripts/rename_tb_ifaces.sh

[Install]
WantedBy=multi-user.target

The script doing the job is as follows :

#!/bin/bash

ip link set thunderbolt0 down
ip link set thunderbolt0 name en06
ip link set en06 up

And now both interfaces are renamed and up.

Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
gt13                                                                  
10.0.0.83/32         IP internal  0                                     gt13(4)
ms01-01              TE-IS        10     ms01-01              en05      gt13(4)
ms01-02              TE-IS        20     ms01-01              en05      ms01-01(4)
10.0.0.81/32         IP TE        20     ms01-01              en05      ms01-01(4)
10.0.0.82/32         IP TE        30     ms01-01              en05      ms01-02(4)

IS-IS paths to level-2 routers that speak IPv6
Vertex               Type         Metric Next-Hop             Interface Parent
gt13                                                                  
fc00::83/128         IP6 internal 0                                     gt13(4)
ms01-01              TE-IS        10     ms01-01              en05      gt13(4)
ms01-02              TE-IS        10     ms01-02              en06      gt13(4)
fc00::81/128         IP6 internal 20     ms01-01              en05      ms01-01(4)
fc00::82/128         IP6 internal 20     ms01-02              en06      ms01-02(4)

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent

I don't know if perf is good enough though :

root@ms01-01:~# iperf3 -c fc00::83 -bidir
Connecting to host fc00::83, port 5201
[  5] local fc00::81 port 59530 connected to fc00::83 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.90 GBytes  16.3 Gbits/sec  475   2.00 MBytes       
[  5]   1.00-2.00   sec  1.88 GBytes  16.1 Gbits/sec  462   2.93 MBytes       
[  5]   2.00-3.00   sec  2.76 GBytes  23.7 Gbits/sec  718   1.31 MBytes       
[  5]   3.00-4.00   sec  2.17 GBytes  18.7 Gbits/sec  477   2.50 MBytes       
[  5]   4.00-5.00   sec  1.25 GBytes  10.7 Gbits/sec  329   1.87 MBytes       
[  5]   5.00-6.00   sec  1.89 GBytes  16.2 Gbits/sec  528   2.87 MBytes       
[  5]   6.00-7.00   sec  1.63 GBytes  14.0 Gbits/sec  424   1.19 MBytes       
[  5]   7.00-8.00   sec  1.57 GBytes  13.5 Gbits/sec  358   2.00 MBytes       
[  5]   8.00-9.00   sec   551 MBytes  4.62 Gbits/sec  157   1.62 MBytes       
[  5]   9.00-10.00  sec  2.21 GBytes  19.0 Gbits/sec  472   1.25 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  17.8 GBytes  15.3 Gbits/sec  4400             sender
[  5]   0.00-10.00  sec  17.8 GBytes  15.3 Gbits/sec                  receiver

And if someone knows how to give the interfaces an ipv4 address, I'd be glad to know :)

@nickglott
Copy link

nickglott commented Oct 20, 2024

@fesnault You can actuly set up ips's in ffr.cfg I was playing with it for a little while back a month or so. Here was the config I was using on 1 of the nodes, you would need to adjust the IP's for your scheme.

frr defaults traditional
hostname TheCore-01
log syslog informational
ip forwarding
ipv6 forwarding
service integrated-vtysh-config
!
interface lo
 ip address 10.0.10.10/32
 ip router openfabric 1
 ipv6 address fc00::10/128
 ipv6 router openfabric 1
 openfabric passive
!
interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
!
interface en06
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
!
line vty
!
router openfabric 1
 net 49.0001.1111.1111.1111.00
 fabric-tier 0
 lsp-gen-interval 1
 max-lsp-lifetime 600
 lsp-refresh-interval 180

You would just not set the ip's of the lo and lo6 in /etc/network/interfaces.d/thunderbolt as this gist does

@nickglott
Copy link

nickglott commented Oct 20, 2024

@scyto I never got to submit a comment about it but I did end up switching to doing the above rather then having it set in /etc/network/interfaces.d/thunderbolt and seting up system ip forwarding. Something to maybe look at.

Bascily all I have in /etc/network/interfaces.d/thunderbolt is

allow-hotplug en05
iface en05 inet manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

Only things in /etc/network/interfaces

iface en05 inet manual
#Thunderbolt Port 0 - 25G

iface en06 inet manual
#Thunderbolt Port 1 - 25G

post-up sleep 5 && /usr/bin/systemctl restart frr.service && sleep 20

For my /etc/frr/frr.conf
*Change X in Hostname, IP, and NET Address for each node to the following
** TheCore-01 | 10.0.10.10/32 | fc00::10/128 | 49.0001.1111.1111.1111.00 **
** TheCore-02 | 10.0.10.20/32 | fc00::20/128 | 49.0001.2222.2222.2222.00 **
** TheCore-03 | 10.0.10.30/32 | fc00::30/128 | 49.0001.3333.3333.3333.00 **

frr defaults traditional
hostname TheCore-0X
log syslog informational
ip forwarding
ipv6 forwarding
service integrated-vtysh-config
!
interface lo
 ip address 10.0.10.X0/32
 ip router openfabric 1
 ipv6 address fc00::X0/128
 ipv6 router openfabric 1
 openfabric passive
!
interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
!
interface en06
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
!
line vty
!
router openfabric 1
 net 49.0001.XXXX.XXXX.XXXX.00
 fabric-tier 0
 lsp-gen-interval 1
 max-lsp-lifetime 600
 lsp-refresh-interval 180

Everything else is the same except I am using this /etc/network/if-up.d/thunderbolt-affinity

#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-7 | tee "/proc/irq/{}/smp_affinity_list"'
fi

@fesnault
Copy link

Hi @nickglott Thank you, I will definitely try this !

About the thunderbolt affinity, this is not in the guide.
I just need to setup this script in if-up.d directory and reboot ?
I think I tried it but got some errors.

@nickglott
Copy link

nickglott commented Oct 22, 2024

Hi @nickglott Thank you, I will definitely try this !

About the thunderbolt affinity, this is not in the guide. I just need to setup this script in if-up.d directory and reboot ? I think I tried it but got some errors.

@fesnault

Scyto did not need it for some reason and we are not quite sure. When doing an iperf3 test if you are not getting the full speed ie. 25gig and having a high retry count, it has been found that using that forcing thunderbolt to only use Pcores fixes that, hence the script. Based on your iperf test I think you can benifit form it.

For the sctipt I am using all you have to do is place the file in /etc/network/if-up.d/, make sure you make it executable with chmod +x /etc/network/if-up.d/thunderbolt-affinity if that is what you name it. I have not found you need to add a service for it as that directory will run any file in it when any ifup command is ran, my script will only set it if en05 and/or en06 is the interface that is ifup to prevent it running with any ifup command.

Remember to adjust for the 0-7 for you pcores, I am using the i5-12600H and those are my pcores (4 + threads starting at 0), for example the i9-1300h and i9-12900h have 6 pcores so counting the hyperthreads it would be 0-11

@fesnault
Copy link

Thank you @nickglott !
I did that, with 0-11 as my 3 nodes are using i9 13900H, then I did if down and ifup on both en05 and en06 interfaces.
But the results are the same, 16 Gbit/s and high retry count.

root@ms01-01:~# iperf3 -c fc00::83 -bidir
Connecting to host fc00::83, port 5201
[  5] local fc00::81 port 33400 connected to fc00::83 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.07 GBytes  9.21 Gbits/sec  264   1.37 MBytes       
[  5]   1.00-2.00   sec  2.17 GBytes  18.6 Gbits/sec  677   1.19 MBytes       
[  5]   2.00-3.00   sec  1.62 GBytes  13.9 Gbits/sec  517   1.25 MBytes       
[  5]   3.00-4.00   sec  2.19 GBytes  18.8 Gbits/sec  670   1.19 MBytes       
[  5]   4.00-5.00   sec  2.19 GBytes  18.8 Gbits/sec  655   1.25 MBytes       
[  5]   5.00-6.00   sec  2.19 GBytes  18.8 Gbits/sec  619   1.31 MBytes       
[  5]   6.00-7.00   sec  1.07 GBytes  9.22 Gbits/sec  285   1.62 MBytes       
[  5]   7.00-8.00   sec  2.74 GBytes  23.6 Gbits/sec  677   1.75 MBytes       
[  5]   8.00-9.00   sec  1.03 GBytes  8.87 Gbits/sec  353   2.12 MBytes       
[  5]   9.00-10.00  sec  2.20 GBytes  18.9 Gbits/sec  676   1.31 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  18.5 GBytes  15.9 Gbits/sec  5393             sender
[  5]   0.00-10.00  sec  18.5 GBytes  15.9 Gbits/sec                  receiver

iperf Done.
root@ms01-01:~# iperf3 -c fc00::82 -bidir
Connecting to host fc00::82, port 5201
[  5] local fc00::81 port 49032 connected to fc00::82 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.02 GBytes  17.4 Gbits/sec  454   2.31 MBytes       
[  5]   1.00-2.00   sec  1.37 GBytes  11.8 Gbits/sec  341   2.37 MBytes       
[  5]   2.00-3.00   sec  1.45 GBytes  12.4 Gbits/sec  352   2.00 MBytes       
[  5]   3.00-4.00   sec  2.22 GBytes  19.1 Gbits/sec  529   2.31 MBytes       
[  5]   4.00-5.00   sec  2.20 GBytes  18.9 Gbits/sec  561   1.31 MBytes       
[  5]   5.00-6.00   sec  2.77 GBytes  23.8 Gbits/sec  592   1.37 MBytes       
[  5]   6.00-7.00   sec  1.58 GBytes  13.6 Gbits/sec  362   2.18 MBytes       
[  5]   7.00-8.00   sec  2.19 GBytes  18.8 Gbits/sec  521   1.81 MBytes       
[  5]   8.00-9.00   sec  1.90 GBytes  16.3 Gbits/sec  335   2.00 MBytes       
[  5]   9.00-10.00  sec   968 MBytes  8.12 Gbits/sec  193   2.18 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  18.7 GBytes  16.0 Gbits/sec  4240             sender
[  5]   0.00-10.00  sec  18.7 GBytes  16.0 Gbits/sec                  receiver

iperf Done.

Maybe I forgot something ?

@contributorr
Copy link

Hi @scyto and everybody!

I had this issue when rebooting any proxmox node that thunderbolt network interface didn't come up. Finally I figured it out and maybe somebody can profit too.

/usr/local/bin/pve-en05.sh

#!/bin/bash

for i in {1..10}; do
    /usr/sbin/ifup en05 && break
    sleep 3
done

/usr/local/bin/pve-en06.sh

#!/bin/bash

for i in {1..10}; do
    /usr/sbin/ifup en06 && break
    sleep 3
done

When triggered by udev, it runs appropriate script, if "ifup" succeeds (exit code 0), the "break" will stop the loop. If exit code is not 0, it will sleep 3 seconds and do it again (up to 10 times in this case, ultimately lasting for 30s). You can adjust sleep time or number of tries in loop.

FYI @ronindesign @ctroyp

@e1ysion
Copy link

e1ysion commented Oct 24, 2024

Hi @scyto and everybody!

I had this issue when rebooting any proxmox node that thunderbolt network interface didn't come up. Finally I figured it out and maybe somebody can profit too.

/usr/local/bin/pve-en05.sh

#!/bin/bash

for i in {1..10}; do
    /usr/sbin/ifup en05 && break
    sleep 3
done

/usr/local/bin/pve-en06.sh

#!/bin/bash

for i in {1..10}; do
    /usr/sbin/ifup en06 && break
    sleep 3
done

When triggered by udev, it runs appropriate script, if "ifup" succeeds (exit code 0), the "break" will stop the loop. If exit code is not 0, it will sleep 3 seconds and do it again (up to 10 times in this case, ultimately lasting for 30s). You can adjust sleep time or number of tries in loop.

FYI @ronindesign @ctroyp

Thanks for your explanation, could you also please share your config for the frr restart part? I think it was in the thunderbolt config.

@ronindesign
Copy link

I had this issue when rebooting any proxmox node that thunderbolt network interface didn't come up. Finally I figured it out and maybe somebody can profit too.

/usr/local/bin/pve-en05.sh

#!/bin/bash

for i in {1..10}; do
    /usr/sbin/ifup en05 && break
    sleep 3
done

/usr/local/bin/pve-en06.sh

#!/bin/bash

for i in {1..10}; do
    /usr/sbin/ifup en06 && break
    sleep 3
done

When triggered by udev, it runs appropriate script, if "ifup" succeeds (exit code 0), the "break" will stop the loop. If exit code is not 0, it will sleep 3 seconds and do it again (up to 10 times in this case, ultimately lasting for 30s). You can adjust sleep time or number of tries in loop.

Fantastic, thanks for tagging me. I'll test it here shortly and see if it solves the issue for me as well. Appreciate it!

@contributorr
Copy link

Hi @scyto and everybody!
I had this issue when rebooting any proxmox node that thunderbolt network interface didn't come up. Finally I figured it out and maybe somebody can profit too.
/usr/local/bin/pve-en05.sh

#!/bin/bash

for i in {1..10}; do
    /usr/sbin/ifup en05 && break
    sleep 3
done

/usr/local/bin/pve-en06.sh

#!/bin/bash

for i in {1..10}; do
    /usr/sbin/ifup en06 && break
    sleep 3
done

When triggered by udev, it runs appropriate script, if "ifup" succeeds (exit code 0), the "break" will stop the loop. If exit code is not 0, it will sleep 3 seconds and do it again (up to 10 times in this case, ultimately lasting for 30s). You can adjust sleep time or number of tries in loop.
FYI @ronindesign @ctroyp

Thanks for your explanation, could you also please share your config for the frr restart part? I think it was in the thunderbolt config.

This really never caused any issue.

/etc/network/interfaces

auto lo
iface lo inet loopback

iface enp86s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address x.x.x.x/24
        gateway x.x.x.x
        bridge-ports enp86s0
        bridge-stp off
        bridge-fd 0

source /etc/network/interfaces.d/*

/etc/network/interfaces.d/thunderbolt

auto lo:0
iface lo:0 inet static
	address x.x.x.x/32

allow-hotplug en05
iface en05 inet manual
	mtu 65520

allow-hotplug en06
iface en06 inet manual
	mtu 65520

post-up /usr/bin/systemctl restart frr.service

@e1ysion
Copy link

e1ysion commented Nov 1, 2024

@uvalleza & all: Quick summary from my last few days using 3 um790 pro from minisforum:

  • Context: Ubuntu 24.04 LTS on 3 "nuc" with AMD cpu. (I was told: usb-4 doesn't necessary means thunderbolt-3, the spec is a pick & choose; on top of that until recently thunderbolt was a intel only feature)
  • The speed i get on direct link is around ~12Gbits like you do
  • The frr needs a "reload" after boot (no need to play with the wires, unless the logs shows invalid config for usb port x) see code snipped bellow
  • Downgrading the bios/uefi from 1.09 to 1.07 gave me less "invalid config for usb x" kind of errors and more reliability after reboots (i still have to unplug-replug some wires sometimes) also, it seems that my speed went from 10-11gbits to 12-13 gbits, but i can't really confirm. => what bios/uefi version are you running with ?
  • I had to stop using encrypted boot drives as it required me to unplug the thunderbolt links to let the hdmi work, then replug it all.
  • cables are indeed placed the same way you do
  • my frr setup seems to be working, but once I remove 1 of the link, the speed is about 2Mbits (yes mega bits) when it needs to do 1 more hop through the 2 other thunderbolt links => i have yet to figure out that part

To auto-reload the frr configuration after reboot (required otherwise it fails to see the thunderbolt links and I get 3 independent nodes that don't see each other via vtysh -c "show openfabric topology") Requirement: have your interfaces renamed (see "tbt" in script) as explained in the first post by scyto (don't use hyphen in interface names, it wasn't working for me)

#!/bin/sh
# Delayed start script to tell frr to reload ensuring that it sees thunderbolt links towards other nodes.
# condition: is there any tbt network interface and frr service up
COUNTER=0
while [ ${COUNTER} -lt 5 ]; do
	sleep 1;
	TEST=$(ip a | grep ": tbt" | grep "UP" | awk 'BEGIN { ORS=""}; {print $2}')
	if [ ${#TEST} -ge 2 ]; then
		TEST_SVC=$(service frr status | grep "active (running)")
		if [ ${#TEST_SVC} -ge 2 ]; then
			service frr reload;
			echo "frr service reload request sent"
			exit 0;
		fi
	fi
	COUNTER=$((COUNTER+1));
done
echo "Failed to request frr service reload: request NOT sent"
exit 1;
[Unit]
After=network.target

[Service]
ExecStart=/usr/local/bin/restart-frr.sh

[Install]
WantedBy=default.target

Note: The script is called restart, but after some testing, I realised that reload was enough.

To all: thank you for sharing your experience, its a great help & motivation to figure out what's going sideways 😄

Hey, just wanted to thank you for your input first. I happen to have the same setup as you, only with 5 nodes. Would you be so kind to share your config with this script: https://github.com/Allistah/Get-Proxmox-Thunderbolt-Config/blob/main/get-thunderbolt-config.sh (Thank you Allistah)

Your help is greatly appreciated, I am pretty desperate here lol

Here my output:
https://privatebin.net/?04649f34b02189de#57hHKfwusnma1Bre1gSkwm2kMi1rucqe8hpG61GSU1cT

Edit:

Also, do I put these two segments into the same .sh file and make it executable, or do I put them at different places?

#!/bin/sh
# Delayed start script to tell frr to reload ensuring that it sees thunderbolt links towards other nodes.
# condition: is there any tbt network interface and frr service up
COUNTER=0
while [ ${COUNTER} -lt 5 ]; do
	sleep 1;
	TEST=$(ip a | grep ": tbt" | grep "UP" | awk 'BEGIN { ORS=""}; {print $2}')
	if [ ${#TEST} -ge 2 ]; then
		TEST_SVC=$(service frr status | grep "active (running)")
		if [ ${#TEST_SVC} -ge 2 ]; then
			service frr reload;
			echo "frr service reload request sent"
			exit 0;
		fi
	fi
	COUNTER=$((COUNTER+1));
done
echo "Failed to request frr service reload: request NOT sent"
exit 1;
[Unit]
After=network.target

[Service]
ExecStart=/usr/local/bin/restart-frr.sh

[Install]
WantedBy=default.target

@scloder
Copy link

scloder commented Nov 13, 2024

@scyto I never got to submit a comment about it but I did end up switching to doing the above rather then having it set in /etc/network/interfaces.d/thunderbolt and seting up system ip forwarding. Something to maybe look at.

Bascily all I have in /etc/network/interfaces.d/thunderbolt is

allow-hotplug en05
iface en05 inet manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

Only things in /etc/network/interfaces

iface en05 inet manual
#Thunderbolt Port 0 - 25G

iface en06 inet manual
#Thunderbolt Port 1 - 25G

post-up sleep 5 && /usr/bin/systemctl restart frr.service && sleep 20

For my /etc/frr/frr.conf *Change X in Hostname, IP, and NET Address for each node to the following ** TheCore-01 | 10.0.10.10/32 | fc00::10/128 | 49.0001.1111.1111.1111.00 ** ** TheCore-02 | 10.0.10.20/32 | fc00::20/128 | 49.0001.2222.2222.2222.00 ** ** TheCore-03 | 10.0.10.30/32 | fc00::30/128 | 49.0001.3333.3333.3333.00 **

frr defaults traditional
hostname TheCore-0X
log syslog informational
ip forwarding
ipv6 forwarding
service integrated-vtysh-config
!
interface lo
 ip address 10.0.10.X0/32
 ip router openfabric 1
 ipv6 address fc00::X0/128
 ipv6 router openfabric 1
 openfabric passive
!
interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
!
interface en06
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
!
line vty
!
router openfabric 1
 net 49.0001.XXXX.XXXX.XXXX.00
 fabric-tier 0
 lsp-gen-interval 1
 max-lsp-lifetime 600
 lsp-refresh-interval 180

Everything else is the same except I am using this /etc/network/if-up.d/thunderbolt-affinity

#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-7 | tee "/proc/irq/{}/smp_affinity_list"'
fi

i like the idea of doing this ip config in FRR, but it seems like proxmox does not really see the network interfaces unless you also define the interfaces in
/etc/network/interfaces

auto lo:0
iface lo:0 inet static
        address 10.0.0.x/32
auto lo:6
iface lo:6 inet static
        address fc00::x/128

Is there some other config you are running that I dont see in your above post? I have placed the ip addresses in /etc/network/interfaces as well as the FRR config, lik you shared. The TB network does work like this, but I am not sure if thats going to cause problems later?

also fwiw, I noticed that for you config of

 lsp-refresh-interval 180

apparently needed to be changed based on this output:

pve1(config-router)#  max-lsp-lifetime 600
Level 1 Max LSP lifetime 600s must be 300s greater than the configured LSP refresh interval 900s
Automatically reducing level 1 LSP refresh interval to 300s
Level 2 Max LSP lifetime 600s must be 300s greater than the configured LSP refresh interval 900s

@pSyCr0
Copy link

pSyCr0 commented Nov 14, 2024

Hi everyone.
Mine thunderbolt cluster of 3 intel nucs (2 x nuc13 and 1x nuc12) is since yesterday completed but I have the following issues/troubles and can't find the issue:

  1. After a restart of the nuc the openfabric connection is not established automatically and I have to run "systemctl restart frr"
  2. Is this thunderbold iperf connection good because I have still some retr errors:
**Node 1 (NUC 13 - 1340p):** connected to Node 2 with a 1m tb-cable and from Node 3 also with 1m tb-cable

root@pve1:~# iperf3 -c 10.0.0.92 -bidir
Connecting to host 10.0.0.92, port 5201
[  5] local 10.0.0.91 port 34894 connected to 10.0.0.92 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  3.07 GBytes  26.3 Gbits/sec   12   3.56 MBytes       
[  5]   1.00-2.00   sec  3.07 GBytes  26.4 Gbits/sec   20   3.37 MBytes       
[  5]   2.00-3.00   sec  3.09 GBytes  26.5 Gbits/sec    0   3.37 MBytes       
[  5]   3.00-4.00   sec  3.06 GBytes  26.3 Gbits/sec    0   3.37 MBytes       
[  5]   4.00-5.00   sec  3.09 GBytes  26.6 Gbits/sec    0   3.37 MBytes       
[  5]   5.00-6.00   sec  3.09 GBytes  26.5 Gbits/sec    1   3.37 MBytes       
[  5]   6.00-7.00   sec  3.05 GBytes  26.2 Gbits/sec    1   3.37 MBytes       
[  5]   7.00-8.00   sec  3.09 GBytes  26.5 Gbits/sec    0   3.37 MBytes       
[  5]   8.00-9.00   sec  3.10 GBytes  26.6 Gbits/sec    0   3.37 MBytes       
[  5]   9.00-10.00  sec  3.09 GBytes  26.5 Gbits/sec    0   3.37 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  30.8 GBytes  26.5 Gbits/sec   34             sender
[  5]   0.00-10.00  sec  30.8 GBytes  26.4 Gbits/sec                  receiver

iperf Done.
root@pve1:~# iperf3 -c 10.0.0.93 -bidir
Connecting to host 10.0.0.93, port 5201
[  5] local 10.0.0.91 port 34106 connected to 10.0.0.93 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.99 GBytes  25.7 Gbits/sec   23   2.81 MBytes       
[  5]   1.00-2.00   sec  3.08 GBytes  26.4 Gbits/sec    4   3.68 MBytes       
[  5]   2.00-3.00   sec  3.07 GBytes  26.3 Gbits/sec    5   2.75 MBytes       
[  5]   3.00-4.00   sec  3.05 GBytes  26.2 Gbits/sec    3   3.81 MBytes       
[  5]   4.00-5.00   sec  2.86 GBytes  24.5 Gbits/sec   58   3.87 MBytes       
[  5]   5.00-6.00   sec  3.08 GBytes  26.4 Gbits/sec    3   3.43 MBytes       
[  5]   6.00-7.00   sec  3.07 GBytes  26.4 Gbits/sec    2   3.75 MBytes       
[  5]   7.00-8.00   sec  3.07 GBytes  26.4 Gbits/sec    5   3.62 MBytes       
[  5]   8.00-9.00   sec  3.08 GBytes  26.5 Gbits/sec    0   3.62 MBytes       
[  5]   9.00-10.00  sec  3.05 GBytes  26.2 Gbits/sec    7   3.62 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  30.4 GBytes  26.1 Gbits/sec  110             sender
[  5]   0.00-10.00  sec  30.4 GBytes  26.1 Gbits/sec                  receiver

iperf Done.

**Node 2 (NUC 13 - 1340p):** connected to Node 1 with a 1m tb-cable and from Node 3 with with 0,3 m tb-cable

root@pve2:~# iperf3 -c 10.0.0.91 -bidir
Connecting to host 10.0.0.91, port 5201
[  5] local 10.0.0.92 port 53938 connected to 10.0.0.91 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  3.10 GBytes  26.6 Gbits/sec   15   3.00 MBytes       
[  5]   1.00-2.00   sec  3.10 GBytes  26.7 Gbits/sec    0   3.00 MBytes       
[  5]   2.00-3.00   sec  3.10 GBytes  26.6 Gbits/sec    0   3.00 MBytes       
[  5]   3.00-4.00   sec  3.08 GBytes  26.5 Gbits/sec    0   3.00 MBytes       
[  5]   4.00-5.00   sec  2.99 GBytes  25.7 Gbits/sec    1   3.00 MBytes       
[  5]   5.00-6.00   sec  3.10 GBytes  26.6 Gbits/sec    0   3.00 MBytes       
[  5]   6.00-7.00   sec  3.12 GBytes  26.8 Gbits/sec    0   3.00 MBytes       
[  5]   7.00-8.00   sec  3.09 GBytes  26.6 Gbits/sec    0   3.00 MBytes       
[  5]   8.00-9.00   sec  3.09 GBytes  26.6 Gbits/sec    0   3.00 MBytes       
[  5]   9.00-10.00  sec  3.09 GBytes  26.5 Gbits/sec    0   3.00 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  30.9 GBytes  26.5 Gbits/sec   16             sender
[  5]   0.00-10.00  sec  30.9 GBytes  26.5 Gbits/sec                  receiver

iperf Done.
root@pve2:~# iperf3 -c 10.0.0.93 -bidir
Connecting to host 10.0.0.93, port 5201
[  5] local 10.0.0.92 port 45410 connected to 10.0.0.93 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.87 GBytes  24.6 Gbits/sec   25   3.56 MBytes       
[  5]   1.00-2.00   sec  3.10 GBytes  26.6 Gbits/sec    0   3.56 MBytes       
[  5]   2.00-3.00   sec  3.11 GBytes  26.7 Gbits/sec    0   3.56 MBytes       
[  5]   3.00-4.00   sec  3.09 GBytes  26.5 Gbits/sec    0   3.56 MBytes       
[  5]   4.00-5.00   sec  3.10 GBytes  26.7 Gbits/sec    0   3.56 MBytes       
[  5]   5.00-6.00   sec  3.11 GBytes  26.7 Gbits/sec    0   3.56 MBytes       
[  5]   6.00-7.00   sec  3.11 GBytes  26.7 Gbits/sec    0   3.56 MBytes       
[  5]   7.00-8.00   sec  3.11 GBytes  26.7 Gbits/sec    0   3.56 MBytes       
[  5]   8.00-9.00   sec  2.30 GBytes  19.8 Gbits/sec   69   1.50 MBytes       
[  5]   9.00-10.00  sec  2.13 GBytes  18.3 Gbits/sec  332   1.31 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  29.0 GBytes  24.9 Gbits/sec  426             sender
[  5]   0.00-10.00  sec  29.0 GBytes  24.9 Gbits/sec                  receiver

iperf Done.

**Node 3 (NUC12 -1240p):** connected to Node 2 with a 0,3 m tb-cable and to Node1 with with 1 m tb-cable

root@pve3:~# iperf3 -c 10.0.0.91 -bidir
Connecting to host 10.0.0.91, port 5201
[  5] local 10.0.0.93 port 51110 connected to 10.0.0.91 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.96 GBytes  25.4 Gbits/sec   54   3.75 MBytes       
[  5]   1.00-2.00   sec  3.01 GBytes  25.8 Gbits/sec   22   2.00 MBytes       
[  5]   2.00-3.00   sec  2.97 GBytes  25.5 Gbits/sec   39   3.56 MBytes       
[  5]   3.00-4.00   sec  2.98 GBytes  25.6 Gbits/sec   31   3.93 MBytes       
[  5]   4.00-5.00   sec  3.02 GBytes  26.0 Gbits/sec   62   3.37 MBytes       
[  5]   5.00-6.00   sec  2.95 GBytes  25.3 Gbits/sec   35   1.87 MBytes       
[  5]   6.00-7.00   sec  3.00 GBytes  25.7 Gbits/sec   32   4.06 MBytes       
[  5]   7.00-8.00   sec  3.01 GBytes  25.8 Gbits/sec   26   2.75 MBytes       
[  5]   8.00-9.00   sec  2.84 GBytes  24.4 Gbits/sec   73   2.81 MBytes       
[  5]   9.00-10.00  sec  3.07 GBytes  26.4 Gbits/sec    3   3.68 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  29.8 GBytes  25.6 Gbits/sec  377             sender
[  5]   0.00-10.00  sec  29.8 GBytes  25.6 Gbits/sec                  receiver

iperf Done.
root@pve3:~# iperf3 -c 10.0.0.92 -bidir
Connecting to host 10.0.0.92, port 5201
[  5] local 10.0.0.93 port 41906 connected to 10.0.0.92 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  3.10 GBytes  26.6 Gbits/sec   14   4.12 MBytes       
[  5]   1.00-2.00   sec  3.11 GBytes  26.7 Gbits/sec    2   4.12 MBytes       
[  5]   2.00-3.00   sec  3.08 GBytes  26.4 Gbits/sec    4   4.12 MBytes       
[  5]   3.00-4.00   sec  3.12 GBytes  26.8 Gbits/sec    0   4.12 MBytes       
[  5]   4.00-5.00   sec  3.11 GBytes  26.7 Gbits/sec    0   4.12 MBytes       
[  5]   5.00-6.00   sec  3.12 GBytes  26.8 Gbits/sec    0   4.12 MBytes       
[  5]   6.00-7.00   sec  3.11 GBytes  26.7 Gbits/sec    0   4.12 MBytes       
[  5]   7.00-8.00   sec  3.11 GBytes  26.7 Gbits/sec    0   4.12 MBytes       
[  5]   8.00-9.00   sec  2.95 GBytes  25.3 Gbits/sec    0   4.12 MBytes       
[  5]   9.00-10.00  sec  3.10 GBytes  26.6 Gbits/sec    0   4.12 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  30.9 GBytes  26.5 Gbits/sec   20             sender
[  5]   0.00-10.00  sec  30.9 GBytes  26.5 Gbits/sec                  receiver

iperf Done.
  1. I can't install the mentioned "tbtools" here and get this error:
root@pve1:~/tbtools# cargo install --path .
  Installing tbtools v0.4.2 (/root/tbtools)
    Updating crates.io index
    Updating git repository `https://github.com/gyscos/cursive`
     Locking 125 packages to latest compatible versions
      Adding bitflags v1.3.2 (latest: v2.6.0)
      Adding hermit-abi v0.3.9 (latest: v0.4.0)
      Adding linux-raw-sys v0.4.14 (latest: v0.6.5)
      Adding memoffset v0.7.1 (latest: v0.9.1)
      Adding nix v0.26.4 (latest: v0.29.0)
      Adding udev v0.7.0 (latest: v0.9.1)
      Adding wasi v0.11.0+wasi-snapshot-preview1 (latest: v0.13.3+wasi-0.2.2)
      Adding windows-sys v0.52.0 (latest: v0.59.0)
      Adding zerocopy v0.7.35 (latest: v0.8.10)
      Adding zerocopy-derive v0.7.35 (latest: v0.8.10)
  Downloaded clap v4.5.21
  Downloaded clap_lex v0.7.3
  Downloaded clap_builder v4.5.21
  Downloaded 3 crates (233.4 KB) in 0.15s
   Compiling proc-macro2 v1.0.89
   Compiling libc v0.2.162
   Compiling serde v1.0.215
   Compiling rustversion v1.0.18
   Compiling parking_lot_core v0.9.10
   Compiling crossbeam-utils v0.8.20
   Compiling num-traits v0.2.19
   Compiling lock_api v0.4.12
   Compiling signal-hook v0.3.17
   Compiling ahash v0.8.11
   Compiling serde_json v1.0.132
   Compiling rustix v0.38.40
   Compiling libudev-sys v0.1.4
   Compiling memoffset v0.7.1
   Compiling anstyle v1.0.10
   Compiling time-core v0.1.2
   Compiling linux-raw-sys v0.4.14
error: linker `cc` not found
  |
  = note: No such file or directory (os error 2)

error: could not compile `signal-hook` (build script) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
error: could not compile `num-traits` (build script) due to 1 previous error
error: could not compile `libudev-sys` (build script) due to 1 previous error
error: could not compile `ahash` (build script) due to 1 previous error
error: could not compile `memoffset` (build script) due to 1 previous error
error: could not compile `serde` (build script) due to 1 previous error
error: could not compile `lock_api` (build script) due to 1 previous error
error: could not compile `parking_lot_core` (build script) due to 1 previous error
error: could not compile `serde_json` (build script) due to 1 previous error
error: could not compile `crossbeam-utils` (build script) due to 1 previous error
error: could not compile `rustversion` (build script) due to 1 previous error
error: could not compile `proc-macro2` (build script) due to 1 previous error
error: could not compile `rustix` (build script) due to 1 previous error
error: could not compile `libc` (build script) due to 1 previous error
error: failed to compile `tbtools v0.4.2 (/root/tbtools)`, intermediate artifacts can be found at `/root/tbtools/target`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.
  1. My network performance to the PBS is very bad. The PBS is an older intel nuc (i5-8259U and 16GB ram) connected to the same switch and 1GBit Connection. The other three nodes are all connected via 2.5 GBIT on this switch and the backup is very slow:

INFO: scsi0: dirty-bitmap status: created new
INFO: 0% (812.0 MiB of 100.0 GiB) in 3s, read: 270.7 MiB/s, write: 245.3 MiB/s
INFO: 1% (1.5 GiB of 100.0 GiB) in 6s, read: 248.0 MiB/s, write: 248.0 MiB/s
INFO: 2% (2.1 GiB of 100.0 GiB) in 9s, read: 212.0 MiB/s, write: 212.0 MiB/s
INFO: 3% (3.0 GiB of 100.0 GiB) in 13s, read: 229.0 MiB/s, write: 229.0 MiB/s
INFO: 4% (4.0 GiB of 100.0 GiB) in 18s, read: 198.4 MiB/s, write: 197.6 MiB/s
INFO: 5% (5.0 GiB of 100.0 GiB) in 46s, read: 36.9 MiB/s, write: 36.9 MiB/s
INFO: 6% (6.0 GiB of 100.0 GiB) in 1m 21s, read: 30.3 MiB/s, write: 30.3 MiB/s
INFO: 7% (7.0 GiB of 100.0 GiB) in 2m 3s, read: 24.0 MiB/s, write: 23.9 MiB/s
INFO: 8% (8.0 GiB of 100.0 GiB) in 2m 40s, read: 27.8 MiB/s, write: 27.6 MiB/s

Done in:
INFO: 100% (100.0 GiB of 100.0 GiB) in 19m 38s, read: 141.5 MiB/s, write: 114.8 MiB/s
INFO: backup is sparse: 60.00 GiB (60%) total zero data
INFO: backup was done incrementally, reused 60.00 GiB (60%)
INFO: transferred 100.00 GiB in 1382 seconds (74.1 MiB/s)

Any hints how I can improve this? So the total speed was about 72 MB/s over the 1Gbit network which seems ok, right?

@scyto
Copy link
Author

scyto commented Nov 15, 2024

2. retr errors:

normal at that level

error: linker cc not found

looks like you have some dependencies missing did you install the ones i listed, intersting i think the list needs some added to it, i suspect i had some depencies already installed, try apt install build-essential

  1. systemctl restart frr

look at some of the other solutions have posted

@scyto
Copy link
Author

scyto commented Nov 15, 2024

the other one kept the thunderbolt0 name

i like your hack as it is one that doesn't use sleeps, i am opposed to using sleeps because everyone's timing will be different / its not deterministic

i don't understand why you had one interface stay as thunderbolt0 - it's supposed to be a simple udev rule, i would hazard its a bug in a particular module version or yet again another variant of a timing issue, i will keep noodling on whether there is something to add to the gist, at this point we have many different workarounds....

@scyto
Copy link
Author

scyto commented Nov 15, 2024

The script doing the job is as follows :

#!/bin/bash

ip link set thunderbolt0 down
ip link set thunderbolt0 name en06
ip link set en06 up

this makes me think your matching rules were not specific, here you are renaming thunderbolt0, thing is which port is thunderbolt0 can and will vary from boot to boot, the PCI matching approach i used means you always know which physical port is en05 and which is en06

the fact you only have to do this one makes me think you have an issue with your original matching rules in some way, infact i strongly suspect you have the wrong dev paths in some way, also you globbing doesn't look right, if you want to think these should work equally as well as what you had (you dont need the /dev and all the man pages show the syntax like this below

path=pci0000:00/0000:00:0d.3/domain1/1-0/1-1
path=pci0000:00/0000:00:0d.3/domain1/1-0/1-3

@scyto
Copy link
Author

scyto commented Nov 15, 2024

@fesnault this may be the cause of your issue with name not changing with the link file and maybe some of the other issues we see

"All .link files do is to suggest the "best" name which is ignored if
NAME has already been set. They do not really rename anything
themselves. Besides, OP cannot use .link files because there is no
persistent device property that can be matched by them.

It is possible that there are conflicting rules in initrd. It is also
possible that udev races with the networkd."

https://lists.freedesktop.org/archives/systemd-devel/2024-August/050657.html

this would be, for example if you have thunderbolt0 in one of the network interface files or in interfaces.d you may end up in a race condition... make sure thunderbolt0 isn't define anywhere else

i am starting to think we are just hitting inherent timing and race conditions based on poor designs in systemd and networkd, and that is why everyone is seeing something slightly different

@uvalleza
Copy link

Thank you @nickglott ! I did that, with 0-11 as my 3 nodes are using i9 13900H, then I did if down and ifup on both en05 and en06 interfaces. But the results are the same, 16 Gbit/s and high retry count.

root@ms01-01:~# iperf3 -c fc00::83 -bidir
Connecting to host fc00::83, port 5201
[  5] local fc00::81 port 33400 connected to fc00::83 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.07 GBytes  9.21 Gbits/sec  264   1.37 MBytes       
[  5]   1.00-2.00   sec  2.17 GBytes  18.6 Gbits/sec  677   1.19 MBytes       
[  5]   2.00-3.00   sec  1.62 GBytes  13.9 Gbits/sec  517   1.25 MBytes       
[  5]   3.00-4.00   sec  2.19 GBytes  18.8 Gbits/sec  670   1.19 MBytes       
[  5]   4.00-5.00   sec  2.19 GBytes  18.8 Gbits/sec  655   1.25 MBytes       
[  5]   5.00-6.00   sec  2.19 GBytes  18.8 Gbits/sec  619   1.31 MBytes       
[  5]   6.00-7.00   sec  1.07 GBytes  9.22 Gbits/sec  285   1.62 MBytes       
[  5]   7.00-8.00   sec  2.74 GBytes  23.6 Gbits/sec  677   1.75 MBytes       
[  5]   8.00-9.00   sec  1.03 GBytes  8.87 Gbits/sec  353   2.12 MBytes       
[  5]   9.00-10.00  sec  2.20 GBytes  18.9 Gbits/sec  676   1.31 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  18.5 GBytes  15.9 Gbits/sec  5393             sender
[  5]   0.00-10.00  sec  18.5 GBytes  15.9 Gbits/sec                  receiver

iperf Done.
root@ms01-01:~# iperf3 -c fc00::82 -bidir
Connecting to host fc00::82, port 5201
[  5] local fc00::81 port 49032 connected to fc00::82 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.02 GBytes  17.4 Gbits/sec  454   2.31 MBytes       
[  5]   1.00-2.00   sec  1.37 GBytes  11.8 Gbits/sec  341   2.37 MBytes       
[  5]   2.00-3.00   sec  1.45 GBytes  12.4 Gbits/sec  352   2.00 MBytes       
[  5]   3.00-4.00   sec  2.22 GBytes  19.1 Gbits/sec  529   2.31 MBytes       
[  5]   4.00-5.00   sec  2.20 GBytes  18.9 Gbits/sec  561   1.31 MBytes       
[  5]   5.00-6.00   sec  2.77 GBytes  23.8 Gbits/sec  592   1.37 MBytes       
[  5]   6.00-7.00   sec  1.58 GBytes  13.6 Gbits/sec  362   2.18 MBytes       
[  5]   7.00-8.00   sec  2.19 GBytes  18.8 Gbits/sec  521   1.81 MBytes       
[  5]   8.00-9.00   sec  1.90 GBytes  16.3 Gbits/sec  335   2.00 MBytes       
[  5]   9.00-10.00  sec   968 MBytes  8.12 Gbits/sec  193   2.18 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  18.7 GBytes  16.0 Gbits/sec  4240             sender
[  5]   0.00-10.00  sec  18.7 GBytes  16.0 Gbits/sec                  receiver

iperf Done.

Maybe I forgot something ?

Hi @fesnault, i am just curious if you ever got this resolved? When using 3 MS-01's I got the same thing above. One node would hit the 20-22GB/S and the other 2 would only hit 15-16. Never found out why it was like that or spent more time fixing but would interested to know if anyone has found a solution.

For me, I kinda just got 3x 100g Mellanox Connect 2 port cards and called it a day. However, the card is PCIE Gen3x16 and the MS01 is PCIE Gen4x8. I did lose 20gbps and only getting 80 instead of the 100. So if anyways willing to dish the money, it works but at that point probably would be wiser to get the 50gb cards or something.

@uvalleza
Copy link

@uvalleza & all: Quick summary from my last few days using 3 um790 pro from minisforum:

  • Context: Ubuntu 24.04 LTS on 3 "nuc" with AMD cpu. (I was told: usb-4 doesn't necessary means thunderbolt-3, the spec is a pick & choose; on top of that until recently thunderbolt was a intel only feature)
  • The speed i get on direct link is around ~12Gbits like you do
  • The frr needs a "reload" after boot (no need to play with the wires, unless the logs shows invalid config for usb port x) see code snipped bellow
  • Downgrading the bios/uefi from 1.09 to 1.07 gave me less "invalid config for usb x" kind of errors and more reliability after reboots (i still have to unplug-replug some wires sometimes) also, it seems that my speed went from 10-11gbits to 12-13 gbits, but i can't really confirm. => what bios/uefi version are you running with ?
  • I had to stop using encrypted boot drives as it required me to unplug the thunderbolt links to let the hdmi work, then replug it all.
  • cables are indeed placed the same way you do
  • my frr setup seems to be working, but once I remove 1 of the link, the speed is about 2Mbits (yes mega bits) when it needs to do 1 more hop through the 2 other thunderbolt links => i have yet to figure out that part

To auto-reload the frr configuration after reboot (required otherwise it fails to see the thunderbolt links and I get 3 independent nodes that don't see each other via vtysh -c "show openfabric topology") Requirement: have your interfaces renamed (see "tbt" in script) as explained in the first post by scyto (don't use hyphen in interface names, it wasn't working for me)

#!/bin/sh
# Delayed start script to tell frr to reload ensuring that it sees thunderbolt links towards other nodes.
# condition: is there any tbt network interface and frr service up
COUNTER=0
while [ ${COUNTER} -lt 5 ]; do
	sleep 1;
	TEST=$(ip a | grep ": tbt" | grep "UP" | awk 'BEGIN { ORS=""}; {print $2}')
	if [ ${#TEST} -ge 2 ]; then
		TEST_SVC=$(service frr status | grep "active (running)")
		if [ ${#TEST_SVC} -ge 2 ]; then
			service frr reload;
			echo "frr service reload request sent"
			exit 0;
		fi
	fi
	COUNTER=$((COUNTER+1));
done
echo "Failed to request frr service reload: request NOT sent"
exit 1;
[Unit]
After=network.target

[Service]
ExecStart=/usr/local/bin/restart-frr.sh

[Install]
WantedBy=default.target

Note: The script is called restart, but after some testing, I realised that reload was enough.
To all: thank you for sharing your experience, its a great help & motivation to figure out what's going sideways 😄

Hey, just wanted to thank you for your input first. I happen to have the same setup as you, only with 5 nodes. Would you be so kind to share your config with this script: https://github.com/Allistah/Get-Proxmox-Thunderbolt-Config/blob/main/get-thunderbolt-config.sh (Thank you Allistah)

Your help is greatly appreciated, I am pretty desperate here lol

Here my output: https://privatebin.net/?04649f34b02189de#57hHKfwusnma1Bre1gSkwm2kMi1rucqe8hpG61GSU1cT

Edit:

Also, do I put these two segments into the same .sh file and make it executable, or do I put them at different places?

#!/bin/sh
# Delayed start script to tell frr to reload ensuring that it sees thunderbolt links towards other nodes.
# condition: is there any tbt network interface and frr service up
COUNTER=0
while [ ${COUNTER} -lt 5 ]; do
	sleep 1;
	TEST=$(ip a | grep ": tbt" | grep "UP" | awk 'BEGIN { ORS=""}; {print $2}')
	if [ ${#TEST} -ge 2 ]; then
		TEST_SVC=$(service frr status | grep "active (running)")
		if [ ${#TEST_SVC} -ge 2 ]; then
			service frr reload;
			echo "frr service reload request sent"
			exit 0;
		fi
	fi
	COUNTER=$((COUNTER+1));
done
echo "Failed to request frr service reload: request NOT sent"
exit 1;
[Unit]
After=network.target

[Service]
ExecStart=/usr/local/bin/restart-frr.sh

[Install]
WantedBy=default.target

Will share the config once i set this up with a friend. He just happened to buy my last UM790 Pro to complete his 3 node UM790 Pro cluster and will be setting this up for him. Don't really expect much issues on a 3 node but could kinda see why you might have issues on a 5 node cluster.

No expert on this but here's my two cents. You have 5 nodes.

  1. How would each node communicate with itself. In a 3 node, you have 2 ports each allowing each one to connect to one another directly. In a 5 node, you only still have 2 ports, unless you're getting 2 more ports. I don't see the above really working unless you're able to jump from one box to another.

Example:
Node 1 is connected to Node 2 and 3.
Node 2 is connected to 4 and 5.

how do you tell Node 1 to go through Node 2 or 3 to get to Node 4/5?

seems like it'll be a routing config.

Anyhow, I'm not sure what your exactly issue is from the above but since I was prepping to use the guide with a friend, figured i chime in the comments :). Will report the results and will share the configs if you still need them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment