Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active February 9, 2026 03:03
Show Gist options
  • Select an option

  • Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.

Select an option

Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Thunderbolt Networking Setup

Thunderbolt Networking

this gist is part of this series

you wil need proxmox kernel 6.2.16-14-pve or higher.

Load Kernel Modules

  • add thunderbolt and thunderbolt-net kernel modules (this must be done all nodes - yes i know it can sometimes work withoutm but the thuderbolt-net one has interesting behaviou' so do as i say - add both ;-)
    1. nano /etc/modules add modules at bottom of file, one on each line
    2. save using x then y then enter

Prepare /etc/network/interfaces

doing this means we don't have to give each thunderbolt a manual IPv6 addrees and that these addresses stay constant no matter what Add the following to each node using nano /etc/network/interfaces

If you see any sections called thunderbolt0 or thunderbol1 delete them at this point.

Create entries to prepopulate gui with reminder

Doing this means we don't have to give each thunderbolt a manual IPv6 or IPv4 addrees and that these addresses stay constant no matter what.

Add the following to each node using nano /etc/network/interfaces this to remind you not to edit en05 and en06 in the GUI

This fragment should go between the existing auto lo section and adapater sections.

iface en05 inet manual
#do not edit it GUI

iface en06 inet manual
#do not edit in GUI

If you see any thunderbol sections delete them from the file before you save it.

*DO NOT DELETE the source /etc/network/interfaces.d/* this will always exist on the latest versions and should be the last or next to last line in /interfaces file

Rename Thunderbolt Connections

This is needed as proxmox doesn't recognize the thunderbolt interface name. There are various methods to do this. This method was selected after trial and error because:

  • the thunderboltX naming is not fixed to a port (it seems to be based on sequence you plug the cables in)
  • the MAC address of the interfaces changes with most cable insertion and removale events
  1. use udevadm monitor command to find your device IDs when you insert and remove each TB4 cable. Yes you can use other ways to do this, i recommend this one as it is great way to understand what udev does - the command proved more useful to me than the syslog or lspci command for troublehsooting thunderbolt issues and behavious. In my case my two pci paths are 0000:00:0d.2and 0000:00:0d.3 if you bought the same hardware this will be the same on all 3 units. Don't assume your PCI device paths will be the same as mine.

  2. create a link file using nano /etc/systemd/network/00-thunderbolt0.link and enter the following content:

[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05
  1. create a second link file using nano /etc/systemd/network/00-thunderbolt1.link and enter the following content:
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

Set Interfaces to UP on reboots and cable insertions

This section en sure that the interfaces will be brought up at boot or cable insertion with whatever settings are in /etc/network/interfaces - this shouldn't need to be done, it seems like a bug in the way thunderbolt networking is handled (i assume this is debian wide but haven't checked).

Huge thanks to @corvy for figuring out a script that should make this much much more reliable for most

  1. create a udev rule to detect for cable insertion using nano /etc/udev/rules.d/10-tb-en.rules with the following content:
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"
  1. save the file

  2. create the first script referenced above using nano /usr/local/bin/pve-en05.sh and with the follwing content:

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en05"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

save the file and then

  1. create the second script referenced above using nano /usr/local/bin/pve-en06.sh and with the follwing content:
#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en06"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

and save the file

  1. make both scripts executable with chmod +x /usr/local/bin/*.sh
  2. run update-initramfs -u -k all to propogate the new link files into initramfs
  3. Reboot (restarting networking, init 1 and init 3 are not good enough, so reboot)

Enabling IP Connectivity

proceed to the next gist

Slow Thunderbolt Performance? Too Many Retries? No traffic? Try this!

verify neighbors can see each other (connectivity troubleshooting)

##3 Install LLDP - this is great to see what nodes can see which.

  • install lldpctl with apt install lldpd on all 3 nodes
  • execute lldpctl you should info

make sure iommu is enabled (speed troubleshooting)

if you are having speed issues make sure the following is set on the kernel command line in /etc/default/grub file intel_iommu=on iommu=pt one set be sure to run update-grub and reboot

everyones grub command line is different this is mine because i also have i915 virtualization, if you get this wrong you can break your machine, if you are not doing that you don't need the i915 entries you see below

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" (note if you have more things in your cmd line DO NOT REMOVE them, just add the two intel ones, doesnt matter where.

Pinning the Thunderbolt Driver (speed and retries troubleshooting)

identify you P and E cores by running the following

cat /sys/devices/cpu_core/cpus && cat /sys/devices/cpu_atom/cpus

you should get two lines on an intel system with P and E cores. first line should be your P cores second line should be your E cores

for example on mine:

root@pve1:/etc/pve# cat /sys/devices/cpu_core/cpus && cat /sys/devices/cpu_atom/cpus
0-7
8-15

create a script to apply affinity settings everytime a thunderbolt interface comes up

  1. make a file at /etc/network/if-up.d/thunderbolt-affinity
  2. add the following to it - make sure to replace echo X-Y with whatever the report told you were your performance cores - e.g. echo 0-7
#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo X-Y | tee "/proc/irq/{}/smp_affinity_list"'
fi
  1. save the file - done

Extra Debugging for Thunderbolt

dynamic kernel tracing - adds more info to dmesg, doesn't overhwelm dmesg

I have only tried this on 6.8 kernels, so YMMV If you want more TB messages in dmesg to see why connection might be failing here is how to turn on dynamic tracing

For bootime you will need to add it to the kernel command line by adding thunderbolt.dyndbg=+p to your /etc/default/grub file, running update-grub and rebooting.

To expand the example above"

`GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt thunderbolt.dyndbg=+p"`  

Don't forget to run update-grub after saving the change to the grub file.

For runtime debug you can run the following command (it will revert on next boot) so this cant be used to cpature what happens at boot time.

`echo -n 'module thunderbolt =p' > /sys/kernel/debug/dynamic_debug/control`

install tbtools

these tools can be used to inspect your thundebolt system, note they rely on rust to be installedm you must use the rustup script below and not intsall rust by package manager at this time (9/15/24)

apt install pkg-config libudev-dev git curl
curl https://sh.rustup.rs -sSf | sh
git clone https://github.com/intel/tbtools
restart you ssh session
cd tbtools
cargo install --path .
@mattyjew
Copy link

Managed to fix the random no affinity file directory and running lldpctl now on each node shows all neighbore nodes correctly. I needed to add auto-hotplug en05 and auto-hotplug en06 to my interfaces file on all three nodes. Now all three are consistantly coming up, just need to get them all stable at 26G and minimum retries. It looks like it works on some nodes some of the time, but not all 3. I'm running NUC12 Pros (1 intel 2 asus versions).

@mattyjew
Copy link

And got the P core script to work. Needed to run: chmod +x /etc/network/if-up.d/thunderbolt-affinity after setting the affinity script. Once done when I run cat /proc/irq/129/smp_affinity I get 00ff for the 0-7 cores instead of ffff indicating all cores. Thanks Gemma27b local AI!

#noob-to-linux

@michaeleberhardt
Copy link

Hey Folks,
I followed the guide (thanks a lo!!) completely and my 3-node MS-01 Cluster ran fine for weeks..
Today I discovered, that thunderbold networking got incredibly slow. Just between 2-10mbps, no matter between which nodes..
Affinity etc is all fine.. did anybody face that problem before? I am on the latest proxmox kernel..

thanks a lot & best regards,
Michael

@Allistah
Copy link

@michaeleberhardt - Roll back to kernel 6.8.12-1-pve and see if the issue goes away. I've had problems with later kernels so I've stuck with this one. I just checked and my cluster has been up for 125 days and is still rockin' 26Gb/s to all nodes.

@michaeleberhardt
Copy link

@Allistah - Thanks, I rolled back to 6.8.12-1-pve, unfortunately no change:

root@node1:~# uname -r 6.8.12-1-pve

root@node1:~# iperf3 -c 172.16.0.2
Connecting to host 172.16.0.2, port 5201
[  5] local 172.16.0.1 port 56376 connected to 172.16.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   608 KBytes  4.98 Mbits/sec   44   2.83 KBytes       
[  5]   1.00-2.00   sec   956 KBytes  7.83 Mbits/sec   44   2.83 KBytes       
[  5]   2.00-3.00   sec   157 KBytes  1.29 Mbits/sec   24   2.83 KBytes       
[  5]   3.00-4.00   sec   472 KBytes  3.87 Mbits/sec   40   2.83 KBytes       
[  5]   4.00-5.00   sec   160 KBytes  1.31 Mbits/sec   28   2.83 KBytes       
[  5]   5.00-6.00   sec   481 KBytes  3.94 Mbits/sec   28   2.83 KBytes       
[  5]   6.00-7.00   sec   478 KBytes  3.92 Mbits/sec   34   2.83 KBytes       
[  5]   7.00-8.00   sec   479 KBytes  3.93 Mbits/sec   34   2.83 KBytes       
[  5]   8.00-9.00   sec  1.32 MBytes  11.1 Mbits/sec   39   7.07 KBytes       
[  5]   9.00-10.00  sec   474 KBytes  3.88 Mbits/sec   38   2.83 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.49 MBytes  4.60 Mbits/sec  353             sender
[  5]   0.00-10.00  sec  5.37 MBytes  4.50 Mbits/sec                  receiver

iperf Done.
root@node1:~# 

Any help is very appreciated :-)
Best regards!
Michael

@michaeleberhardt
Copy link

Okay, I found a solution..
Don´t ask me why, but from the start it worked without setting a MTU explicitly.
Now I set MTU to 65520 and it works at about 24-26Gbps..
So if anybody faces a similiar issue, check MTU.
btw: it works on Kernel 6.8.12-12-pve.

Best regards!
Michael

@persil
Copy link

persil commented Jul 26, 2025

Hello friends.

Does anyone tried to set it up on PC with AMD Ryzen AI 9 HX 370? I am now putting together my proxmox homelab basing on PC with this processor (Acemagic F3A), but when I connect the two units together through 40Gb enabled usbc cable, then there is no reaction in the output of udevadm monitor. Cable seems to be ok and port (one of the two marked as usb4, which matches processor listed capabilities →https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-9-hx-370.html ) is working - when I connect my phone - then udevadm registers it - so I was able to grab its device ID.

Best regards,
Marcin

@ssavkar
Copy link

ssavkar commented Jul 26, 2025

Hey Folks, I followed the guide (thanks a lo!!) completely and my 3-node MS-01 Cluster ran fine for weeks.. Today I discovered, that thunderbold networking got incredibly slow. Just between 2-10mbps, no matter between which nodes.. Affinity etc is all fine.. did anybody face that problem before? I am on the latest proxmox kernel..

thanks a lot & best regards, Michael

Curious what MS-01 are you using and how much memory do you have installed in each? I have a 12600h cluster (3 nodes) that I set up and originally thought it was also working fine and fast but has slowed down significantly. I am also trying to debug (haven't toyed with MTU yet) but am using 32GB of RAM in each of the three machines and wonder if memory could be an issue? I had seen that referenced one or two places.

I actually have a separate cluster running in another location with one 13900h and two 12900h machines, and that one works perfectly. Those three machines have 96GB of memory though, so I am just starting to really wonder.

Using same thunderbolt cables though I thought maybe to swap out to higher quality ones like OWC.

@Randymartin1991
Copy link

I also have the 12900H ms-01, with 64 gb, and the speed of one node, i noticed went down yesterday, after a reboot the speed was up again. Still trying to find where the issue is.

Hey Folks, I followed the guide (thanks a lo!!) completely and my 3-node MS-01 Cluster ran fine for weeks.. Today I discovered, that thunderbold networking got incredibly slow. Just between 2-10mbps, no matter between which nodes.. Affinity etc is all fine.. did anybody face that problem before? I am on the latest proxmox kernel..
thanks a lot & best regards, Michael

Curious what MS-01 are you using and how much memory do you have installed in each? I have a 12600h cluster (3 nodes) that I set up and originally thought it was also working fine and fast but has slowed down significantly. I am also trying to debug (haven't toyed with MTU yet) but am using 32GB of RAM in each of the three machines and wonder if memory could be an issue? I had seen that referenced one or two places.

I actually have a separate cluster running in another location with one 13900h and two 12900h machines, and that one works perfectly. Those three machines have 96GB of memory though, so I am just starting to really wonder.

Using same thunderbolt cables though I thought maybe to swap out to higher quality ones like OWC.

@jasonmako
Copy link

6.8.12-13-pve completely broke my ceph cluster today. Has anyone applied this update and run into an issue?

@Allistah
Copy link

Allistah commented Jul 27, 2025 via email

@jasonmako
Copy link

I foolishly updated all three nodes instead of just doing one at a time and checking ceph. It appears something happened to Thunderbolt and en05/06 are no longer coming up. The three node NUC 12 cluster has been up and running for well over a year without issues. The only time I had to jump in and take action was when vrr package was updated to 10.2.

Updates applied:
proxmox-kernel-6.8.12-13
proxmox-kernel-6.8.12-13-pve-signed
proxmox-kernel-helper 8.1.4
pve-container 5.3.0

In DMESG I see a number of items related to Thunderbolt.

[    9.262321] thunderbolt 1-0:1.1: retimer disconnected
[    9.460377] ucsi_acpi USBC000:00: error -ETIMEDOUT: PPM init failed
[   10.453784] thunderbolt 1-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
[   15.793042] thunderbolt 1-1: new host found, vendor=0x8086 device=0x1
[   15.793047] thunderbolt 1-1: Intel Corp. pve01
[   15.797574] thunderbolt-net 1-1.0 en06: renamed from thunderbolt

and

[   10.534335] ucsi_acpi USBC000:00: failed to reset PPM!
[   10.534340] ucsi_acpi USBC000:00: error -ETIMEDOUT: PPM init failed

I have a feeling it's the proxmox-kernel-6.8.12-13 backport fix for passthrough of devices without proper PCI power
management doing something with USB-C/Thunderbolt connections.

@Randymartin1991
Copy link

Hey Folks, I followed the guide (thanks a lo!!) completely and my 3-node MS-01 Cluster ran fine for weeks.. Today I discovered, that thunderbold networking got incredibly slow. Just between 2-10mbps, no matter between which nodes.. Affinity etc is all fine.. did anybody face that problem before? I am on the latest proxmox kernel..
thanks a lot & best regards, Michael

Curious what MS-01 are you using and how much memory do you have installed in each? I have a 12600h cluster (3 nodes) that I set up and originally thought it was also working fine and fast but has slowed down significantly. I am also trying to debug (haven't toyed with MTU yet) but am using 32GB of RAM in each of the three machines and wonder if memory could be an issue? I had seen that referenced one or two places.

I actually have a separate cluster running in another location with one 13900h and two 12900h machines, and that one works perfectly. Those three machines have 96GB of memory though, so I am just starting to really wonder.

Using same thunderbolt cables though I thought maybe to swap out to higher quality ones like OWC.

I changes one thunderbolt cable today to a beter one, unfortunatly no speed improvement.

@michaeleberhardt
Copy link

Hey Folks, I followed the guide (thanks a lo!!) completely and my 3-node MS-01 Cluster ran fine for weeks.. Today I discovered, that thunderbold networking got incredibly slow. Just between 2-10mbps, no matter between which nodes.. Affinity etc is all fine.. did anybody face that problem before? I am on the latest proxmox kernel..
thanks a lot & best regards, Michael

Curious what MS-01 are you using and how much memory do you have installed in each? I have a 12600h cluster (3 nodes) that I set up and originally thought it was also working fine and fast but has slowed down significantly. I am also trying to debug (haven't toyed with MTU yet) but am using 32GB of RAM in each of the three machines and wonder if memory could be an issue? I had seen that referenced one or two places.

I actually have a separate cluster running in another location with one 13900h and two 12900h machines, and that one works perfectly. Those three machines have 96GB of memory though, so I am just starting to really wonder.

Using same thunderbolt cables though I thought maybe to swap out to higher quality ones like OWC.

Hi, I am using i9-13900H with 32GB RAM. Set the MTU, it will probably solve your issues.

@ssavkar
Copy link

ssavkar commented Jul 28, 2025

Hi, I am using i9-13900H with 32GB RAM. Set the MTU, it will probably solve your issues.

Actually I just debugged and i can't believe how dumb the answer was. Nothing to do with cables, setting the MTU (I had it set to 65520). As I had noted, I actually have already running (perfectly) a 3-node cluster at my main house and when I finally carefully compared all the files I set up to the one that was experiencing slower throughput, the issue was i forgot to change to executable the thunderbolt-affinity file.

That was literally it! I have rebooted all three nodes after making that fix and everything is perfect, including going down to very few if any retries.

In fact @scyto, you may want to just tweak the section around the thunderbolt-affinity fix to make it clear people have to change the file to executable. I already should have realized it, but probably not bad to make this explicit in the instructions.

@theeshadow
Copy link

IHi all,

I'm troubleshooting a strange IPv6 issue using Thunderbolt networking in a Proxmox cluster of Minisforum MS-01 machines. Hoping someone here can help shed light.

Hardware
proxmox-01: MS-01 (i9-13900)

proxmox-02: MS-01 (i9-13900)

proxmox-03: MS-01 (i9-12900)

All nodes are directly connected using Thunderbolt peer-to-peer (no switch). Each node has 2 Thunderbolt NICs: en05 and en06.

Network Setup
IPv6 static addressing in fc00::/64 ULA space

fc00::81 = proxmox-01

fc00::82 = proxmox-02

fc00::83 = proxmox-03

Interfaces brought up manually, addresses statically assigned

No IPv4 on these interfaces

Issue Summary
proxmox-01 <--> proxmox-02 work perfectly over Thunderbolt

proxmox-03 only works if one Thunderbolt interface (en05) is physically unplugged

With both TB ports connected on proxmox-03, no ping6 to/from it succeeds

Link-local (fe80::) pings often work between nodes, but not always consistently

Neighbor entries show up, but global pings (fc00::) silently fail

No ip6tables/nftables firewall present, forwarding and accept_ra are enabled

Traffic shows up in tcpdump but never receives a reply

Debugging Done
Verified L2 visibility using lldpctl: each TB interface sees its peer, MACs match

Tried using ndisc6, iperf3, manual ip -6 neigh add, static routes

Reordered cables and tested each interface independently

Confirmed MTU, interface state, kernel modules, sysctl settings

Removing static fc00::83 and falling back to link-local often restores partial connectivity

Observation
It seems proxmox-03 can't route/respond properly when both TB interfaces are active — but works once one is removed. It's almost like dual NIC paths are interfering with neighbor discovery or routing.

Could this be an oddity in the way the Thunderbolt NICs behave on the 12900 model? Or something Linux (Bookworm) needs configured differently for multi-homed IPv6 on these interfaces?

Any ideas or known quirks here?

Thanks in advance.

@theeshadow
Copy link

interestingly enough... i was able to get it working... what i did? i have no clue... That being said... I get health error issues in proxmox related to using ipv6. Is everyone else having those same issues? I saw that it was an open issue but i havent seen anyone in here speak on it...

@Allistah
Copy link

Allistah commented Jul 30, 2025 via email

@ilbarone87
Copy link

ilbarone87 commented Aug 5, 2025

Has anyone tried the beta for proxmox 9.0. Is it safe to update? Will the new kernel 6.14.8-2 have the same issue as 6.8.12?
9.0 stable has been released today, though. Especially with support for fabrics for SDN stacks, will that clash with our configuration if I do major upgrade?

@taslabs-net
Copy link

Has anyone tried the beta for proxmox 9.0. Is it safe to update? Will the new kernel 6.14.8-2 have the same issue as 6.8.12? 9.0 stable has been released today, though. Especially with support for fabrics for SDN stacks, will that clash with our configuration if I do major upgrade?

yeah..I did

https://gist.github.com/taslabs-net/9da77d302adb9fc3f10942d81f700a05

@ilbarone87
Copy link

Has anyone tried the beta for proxmox 9.0. Is it safe to update? Will the new kernel 6.14.8-2 have the same issue as 6.8.12? 9.0 stable has been released today, though. Especially with support for fabrics for SDN stacks, will that clash with our configuration if I do major upgrade?

yeah..I did

https://gist.github.com/taslabs-net/9da77d302adb9fc3f10942d81f700a05

Thank you!!!

@theeshadow
Copy link

theeshadow commented Aug 6, 2025

Thanks for posting that!

@theeshadow
Copy link

theeshadow commented Aug 6, 2025

The affinity script doesnt seem to work for me... i keep fffff when checking smp_affinity...

My file is as follows:

#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-11 | tee "/proc/irq/{}/smp_affinity_list"'
fi

I am running on 3 MS-01s and the file is executable...

thoughts?

@ssavkar
Copy link

ssavkar commented Aug 6, 2025

Has anyone tried the beta for proxmox 9.0. Is it safe to update? Will the new kernel 6.14.8-2 have the same issue as 6.8.12? 9.0 stable has been released today, though. Especially with support for fabrics for SDN stacks, will that clash with our configuration if I do major upgrade?

yeah..I did
https://gist.github.com/taslabs-net/9da77d302adb9fc3f10942d81f700a05

Thank you!!!

I already have two separate clusters running "perfectly" on MS-01s with thunderbolt network at 26GB. I want to upgrade to Proxmox 9 but was hoping that things would transfer over cleanly without my having to redo everything from scratch. Curious if anyone else has attempted this and what, if any problems, they encountered.

I don't really want to recreate the whole setup from scratch if I can avoid it!

@archiebug
Copy link

It appears that 9.0 was releases today. Might be wise to wait a bit, before upgrading, so any missed bugs get fixed.

@Randymartin1991
Copy link

Yes, I can confirm, updates breaks the proxmox node, running this config. Not sure what exactly happend, because I also pass through my GPU so i could not do much, since I had no screen. But node did not come back online after the update. Did a reinstall of the node, and Everything is working HA again 👍

@Rgamer84
Copy link

Rgamer84 commented Aug 7, 2025

I can confirm as well... do NOT upgrade from 8->9 if you are running this configuration. Your proxmox node will break once you reboot. I'm in the process of trying to sort out what part went sideways. As far as I can tell, it gets stuck in a bringing up network interfaces state and I haven't yet sorted how to get past that. I'll likely start ripping out bits and pieces to try to see what the offending culprit is and cross compare what I have vs the taslabs-net link that was posted above as that appears to be working for others.

@Allistah
Copy link

Allistah commented Aug 7, 2025 via email

@jacoburgin
Copy link

Can also confirm upgrading breaks it. I have however done a fresh install of PVE9 on my intel NUCS using TB ring network. I have followed Scyto's guide through until you create your vtysh config.

What you do from there is create an SDN with the routing information and hey presto...

In my case however migration works manually but I get a timeout when a service wants to migrate back to a restarted node which I HAD fixed previousely.

@Rgamer84
Copy link

Rgamer84 commented Aug 7, 2025

Well, I got the borked node back up. It took quite a few hours and it's not a permanent fix but it's late so don't want to mess with it any longer than I have already tonight. This is for anyone that doesn't want to have to rebuild from scratch but get the node to a half operational state. Sorry if my formatting is crap, I wanted to just get this out there for now.

If you get stuck at a /dev/mapper/pve-root: clean xxxxxxx screen, it's because networking is failing to start and the service itself is also set to never timeout. These are the steps that I took to get it working again.

Boot to Advanced Options for Proxmox VE GNU/Linux
Proxmox VE GNU/Linux, with Linux 6.14.8-2-pve (recovery mode)

nano /etc/systemd/system/network-online.target.wants/networking.service/systemd-networkd-wait-online.service

  • Add under [Service] "TimeoutStartSec=30s"
    nano /etc/systemd/system/network-online.target.wants/networking.service
  • Add under [Service] "TimeoutStartSec=30s"

Comment out both en05 and en06 interfaces
nano /etc/network/interfaces

#auto en05
#iface en05 inet manual
#Do not edit in GUI

#auto en06
#iface en06 inet manual
#Do not edit in GUI

nano /etc/network/interfaces.d/thunderbolt

auto lo:0
iface lo:0 inet static
        address 10.0.0.83/32

auto lo:6
iface lo:6 inet static
        address fc00::83/128

allow-hotplug en05
iface en05 inet manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

I also noticed that /etc/sysctl.conf was missing completely. I did NOT readd it yet as things are working for now and I'm not sure why it was nuked to begin with.

The systemd-networkd-wait-online.service and networking.service will still time out but you should get basic network connectivity to the node as well as get ceph operational again. YMMV on this one but hopefully it sparks a few ideas as to what went wrong. I can say that I did notice I some redundancy between the interfaces file and the thunderbolt file which I suspect is leading to some of the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment