Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active October 13, 2025 15:26
Show Gist options
  • Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Thunderbolt Networking Setup

Thunderbolt Networking

this gist is part of this series

you wil need proxmox kernel 6.2.16-14-pve or higher.

Load Kernel Modules

  • add thunderbolt and thunderbolt-net kernel modules (this must be done all nodes - yes i know it can sometimes work withoutm but the thuderbolt-net one has interesting behaviou' so do as i say - add both ;-)
    1. nano /etc/modules add modules at bottom of file, one on each line
    2. save using x then y then enter

Prepare /etc/network/interfaces

doing this means we don't have to give each thunderbolt a manual IPv6 addrees and that these addresses stay constant no matter what Add the following to each node using nano /etc/network/interfaces

If you see any sections called thunderbolt0 or thunderbol1 delete them at this point.

Create entries to prepopulate gui with reminder

Doing this means we don't have to give each thunderbolt a manual IPv6 or IPv4 addrees and that these addresses stay constant no matter what.

Add the following to each node using nano /etc/network/interfaces this to remind you not to edit en05 and en06 in the GUI

This fragment should go between the existing auto lo section and adapater sections.

iface en05 inet manual
#do not edit it GUI

iface en06 inet manual
#do not edit in GUI

If you see any thunderbol sections delete them from the file before you save it.

*DO NOT DELETE the source /etc/network/interfaces.d/* this will always exist on the latest versions and should be the last or next to last line in /interfaces file

Rename Thunderbolt Connections

This is needed as proxmox doesn't recognize the thunderbolt interface name. There are various methods to do this. This method was selected after trial and error because:

  • the thunderboltX naming is not fixed to a port (it seems to be based on sequence you plug the cables in)
  • the MAC address of the interfaces changes with most cable insertion and removale events
  1. use udevadm monitor command to find your device IDs when you insert and remove each TB4 cable. Yes you can use other ways to do this, i recommend this one as it is great way to understand what udev does - the command proved more useful to me than the syslog or lspci command for troublehsooting thunderbolt issues and behavious. In my case my two pci paths are 0000:00:0d.2and 0000:00:0d.3 if you bought the same hardware this will be the same on all 3 units. Don't assume your PCI device paths will be the same as mine.

  2. create a link file using nano /etc/systemd/network/00-thunderbolt0.link and enter the following content:

[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05
  1. create a second link file using nano /etc/systemd/network/00-thunderbolt1.link and enter the following content:
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

Set Interfaces to UP on reboots and cable insertions

This section en sure that the interfaces will be brought up at boot or cable insertion with whatever settings are in /etc/network/interfaces - this shouldn't need to be done, it seems like a bug in the way thunderbolt networking is handled (i assume this is debian wide but haven't checked).

Huge thanks to @corvy for figuring out a script that should make this much much more reliable for most

  1. create a udev rule to detect for cable insertion using nano /etc/udev/rules.d/10-tb-en.rules with the following content:
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"
  1. save the file

  2. create the first script referenced above using nano /usr/local/bin/pve-en05.sh and with the follwing content:

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en05"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

save the file and then

  1. create the second script referenced above using nano /usr/local/bin/pve-en06.sh and with the follwing content:
#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en06"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

and save the file

  1. make both scripts executable with chmod +x /usr/local/bin/*.sh
  2. run update-initramfs -u -k all to propogate the new link files into initramfs
  3. Reboot (restarting networking, init 1 and init 3 are not good enough, so reboot)

Enabling IP Connectivity

proceed to the next gist

Slow Thunderbolt Performance? Too Many Retries? No traffic? Try this!

verify neighbors can see each other (connectivity troubleshooting)

##3 Install LLDP - this is great to see what nodes can see which.

  • install lldpctl with apt install lldpd on all 3 nodes
  • execute lldpctl you should info

make sure iommu is enabled (speed troubleshooting)

if you are having speed issues make sure the following is set on the kernel command line in /etc/default/grub file intel_iommu=on iommu=pt one set be sure to run update-grub and reboot

everyones grub command line is different this is mine because i also have i915 virtualization, if you get this wrong you can break your machine, if you are not doing that you don't need the i915 entries you see below

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" (note if you have more things in your cmd line DO NOT REMOVE them, just add the two intel ones, doesnt matter where.

Pinning the Thunderbolt Driver (speed and retries troubleshooting)

identify you P and E cores by running the following

cat /sys/devices/cpu_core/cpus && cat /sys/devices/cpu_atom/cpus

you should get two lines on an intel system with P and E cores. first line should be your P cores second line should be your E cores

for example on mine:

root@pve1:/etc/pve# cat /sys/devices/cpu_core/cpus && cat /sys/devices/cpu_atom/cpus
0-7
8-15

create a script to apply affinity settings everytime a thunderbolt interface comes up

  1. make a file at /etc/network/if-up.d/thunderbolt-affinity
  2. add the following to it - make sure to replace echo X-Y with whatever the report told you were your performance cores - e.g. echo 0-7
#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo X-Y | tee "/proc/irq/{}/smp_affinity_list"'
fi
  1. save the file - done

Extra Debugging for Thunderbolt

dynamic kernel tracing - adds more info to dmesg, doesn't overhwelm dmesg

I have only tried this on 6.8 kernels, so YMMV If you want more TB messages in dmesg to see why connection might be failing here is how to turn on dynamic tracing

For bootime you will need to add it to the kernel command line by adding thunderbolt.dyndbg=+p to your /etc/default/grub file, running update-grub and rebooting.

To expand the example above"

`GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt thunderbolt.dyndbg=+p"`  

Don't forget to run update-grub after saving the change to the grub file.

For runtime debug you can run the following command (it will revert on next boot) so this cant be used to cpature what happens at boot time.

`echo -n 'module thunderbolt =p' > /sys/kernel/debug/dynamic_debug/control`

install tbtools

these tools can be used to inspect your thundebolt system, note they rely on rust to be installedm you must use the rustup script below and not intsall rust by package manager at this time (9/15/24)

apt install pkg-config libudev-dev git curl
curl https://sh.rustup.rs -sSf | sh
git clone https://github.com/intel/tbtools
restart you ssh session
cd tbtools
cargo install --path .
@corvy
Copy link

corvy commented Oct 5, 2025

I was specifically thinking about the udev-debug.log.

Without the en0x scripts the thunderbolt ring will not be enabled. If you have issues during boot I would check that you are in fact using the lates version of the script and double check permissions etc.

@archiebug
Copy link

archiebug commented Oct 5, 2025

I'll have to move them scripts back in and do a reboot of a node. As of now, everything is back up and running on pve9.

I think the issue is the frr.service restart in the lo and en0x scripts. ref: @scyto here

It looks like sergio added a 10 second timeout in the lo and en0x scripts here

Seems like the 2 scrips were created to solve issues with MS-01 minis. I'm using ASUS NUCs that are same/similiar to the gist guide here.

@corvy
Copy link

corvy commented Oct 5, 2025

I am also using Asus NUCs.

Have you remembered to comment out this?

/etc/systemd/system/frr.service.d/dependencies.conf

#[Unit]
#Wants=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
#After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

@archiebug
Copy link

I never had the conf file before. I don't think they were created when I followed the gist back in June?

@cswaas
Copy link

cswaas commented Oct 9, 2025

With the help of all these great guides (thanks) I successfully migrated my Intel NUC 12 Pro 3-node Proxmox Ceph Cluster from pve 8.2.7 -> 8.4.14 -> 9.0.10. Everything went smooth by following this by @corvy.

Just the part for the following two files confused me first, as they were/are not present in my setup:

  • /etc/systemd/system/frr.service.d/dependencies.conf
  • /etc/network/interfaces.d/thunderbolt
    I disregarded the advice for these files accordingly. I will need to figure out, if I am missing something through the absence of these files. Well, the system seems to work without.

There is one small anolmaly in the performance now. As my Intel NUCs are not the latest and greatest, I was previously happy with 15-17Gbits/sec in iperf3 performance on the TB links. Now after the upgrade, one of the TB links is up to the predicted 25-26Gbit/sec, but not the other two TB links (see screenshot). I would like to investigate this difference. Any advice where to start?

Pasted image 20251009080537

@Allistah
Copy link

Allistah commented Oct 9, 2025

I'm currently running a three node cluster using NUC 13 Pros following this guide and it's been running great for some time. I get a full 26 Gb/s throughput to all nodes using iperf3. I'd like to upgrade to Proxmox 9 but I'm hesitant on doing so because I'm not really sure what I need to do to get there given that I already have everything working.

I ran the pve8to9 script and made a few adjustments to make that script happy. Outside of that, what else do I need to do to get to Proxmox 9 and ensure that my Thunderbolt network continues to work?

@cswaas
Copy link

cswaas commented Oct 10, 2025

@Allistah I was in about your situation, hesitant to change a running system (pve 8.2.7 in my case). However, at some point it will be to late to upgrade without complications. I observed the following things in my upgrade:

  • I upgraded first from pve 8.2.7 to 8.4.14 and ceph 18 to 19, before 8 to 9
  • read carefully the whole thread of messages above and try to understand, if they apply to your setup
  • read carefully the Proxmox upgrade guide
  • read carefully the post by @corvy above
  • execute diligently all steps in the upgrade guide, don‘t rush

Also, I used ChatGPT to clarify questions during the upgrade process, when I was confused.
I hope this helps in high level way.

@corvy
Copy link

corvy commented Oct 10, 2025

I never had the conf file before. I don't think they were created when I followed the gist back in June?

These were important to get routing working when running dual ipv4 and ipv6 for frr. If you ran a ipv6 only then it was not needed. Now they are in any case not needed due to changes in the way network and use works on PVE9.

I ran the pve8to9 script and made a few adjustments to make that script happy. Outside of that, what else do I need to do to get to Proxmox 9 and ensure that my Thunderbolt network continues to work?

Just make sure the /etc/udev/rules.d/10-tb-en.rules is correct. In my experience this is critical.

ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"

And again make sure referenced scripts in /use/local/bin/pve-en0X.sh are set up correct.

There is one small anolmaly in the performance now. As my Intel NUCs are not the latest and greatest, I was previously happy with 15-17Gbits/sec in iperf3 performance on the TB links. Now after the upgrade, one of the TB links is up to the predicted 25-26Gbit/sec, but not the other two TB links (see screenshot). I would like to investigate this difference. Any advice where to start?

I woul look at the use of efficiency and performance cores. Iommu settings for the grub kernel boot parameters. Also some have success limiting the speed. Read here:
https://gist.github.com/scyto/67fdc9a517faefa68f730f82d7fa3570#slow-thunderbolt-performance-too-many-retries-no-traffic-try-this

Double check every node. And then check again.

@Allistah
Copy link

@Allistah I was in about your situation, hesitant to change a running system (pve 8.2.7 in my case). However, at some point it will be to late to upgrade without complications. I observed the following things in my upgrade:

  • I upgraded first from pve 8.2.7 to 8.4.14 and ceph 18 to 19, before 8 to 9
  • read carefully the whole thread of messages above and try to understand, if they apply to your setup
  • read carefully the Proxmox upgrade guide
  • read carefully the post by @corvy above
  • execute diligently all steps in the upgrade guide, don‘t rush

Also, I used ChatGPT to clarify questions during the upgrade process, when I was confused. I hope this helps in high level way.

I did the upgrade but when it rebooted, it didn't come back online. I'll have to plug a monitor in when I get home and see whats going on with the boot process.

@Allistah
Copy link

This is what I get on the console after the upgrade and it rebooted. Not sure what to do from here. Any idea how I can fix this?
IMG_2276

@Allistah
Copy link

Figured it out. Had to open the file /etc/network/interfaces.d/thunderbolt
...and comment out the line "post-up /usr/bin/systemctl restart frr.service"

@cswaas
Copy link

cswaas commented Oct 11, 2025

@Allistah I am glad you figured it out. That is exactly the part that did not apply to my setup, but was mentioned in the thread above and confused me first.

@Allistah
Copy link

Allistah commented Oct 11, 2025

Now that I've upgraded to 9, I'm getting this crash on one of the nodes for some reason:
2 mgr modules have recently crashed
mgr module diskprediction_local crashed in daemon mgr.pve-node03 on host pve-node03 at 2025-10-11T04:57:32.116408Z

I didn't have anything like that with v8. I know this is out of scope for this gist, just mentioning since many of us have the same configs.

Also, the only thing I have not done yet is move to the new UI for the fabric stuff. It's still using the config from this gist. Getting to PVE9 was enough for now with the struggles I had. I'll move the fabric stuff to the new UI after a while when I have some more time.

Update: The crashes have not happened again since I upgraded, maybe it was just a fluke that happened during/right after the upgrade. Everything seems stable now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment