scyto/proxmox-tb-net.md

Last active February 9, 2026 03:03

Star (120) You must be signed in to star a gist
Fork (24) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/scyto/67fdc9a517faefa68f730f82d7fa3570.js"></script>
Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.

Download ZIP

Thunderbolt Networking Setup

Raw

proxmox-tb-net.md

Thunderbolt Networking

this gist is part of this series

you wil need proxmox kernel 6.2.16-14-pve or higher.

Load Kernel Modules

add thunderbolt and thunderbolt-net kernel modules (this must be done all nodes - yes i know it can sometimes work withoutm but the thuderbolt-net one has interesting behaviou' so do as i say - add both ;-)
1. nano /etc/modules add modules at bottom of file, one on each line
2. save using x then y then enter

Prepare /etc/network/interfaces

doing this means we don't have to give each thunderbolt a manual IPv6 addrees and that these addresses stay constant no matter what Add the following to each node using nano /etc/network/interfaces

If you see any sections called thunderbolt0 or thunderbol1 delete them at this point.

Create entries to prepopulate gui with reminder

Doing this means we don't have to give each thunderbolt a manual IPv6 or IPv4 addrees and that these addresses stay constant no matter what.

Add the following to each node using nano /etc/network/interfaces this to remind you not to edit en05 and en06 in the GUI

This fragment should go between the existing auto lo section and adapater sections.

iface en05 inet manual
#do not edit it GUI

iface en06 inet manual
#do not edit in GUI

If you see any thunderbol sections delete them from the file before you save it.

*DO NOT DELETE the source /etc/network/interfaces.d/* this will always exist on the latest versions and should be the last or next to last line in /interfaces file

Rename Thunderbolt Connections

This is needed as proxmox doesn't recognize the thunderbolt interface name. There are various methods to do this. This method was selected after trial and error because:

the thunderboltX naming is not fixed to a port (it seems to be based on sequence you plug the cables in)
the MAC address of the interfaces changes with most cable insertion and removale events

use udevadm monitor command to find your device IDs when you insert and remove each TB4 cable. Yes you can use other ways to do this, i recommend this one as it is great way to understand what udev does - the command proved more useful to me than the syslog or lspci command for troublehsooting thunderbolt issues and behavious. In my case my two pci paths are 0000:00:0d.2and 0000:00:0d.3 if you bought the same hardware this will be the same on all 3 units. Don't assume your PCI device paths will be the same as mine.
create a link file using nano /etc/systemd/network/00-thunderbolt0.link and enter the following content:

[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05

create a second link file using nano /etc/systemd/network/00-thunderbolt1.link and enter the following content:

[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

Set Interfaces to UP on reboots and cable insertions

This section en sure that the interfaces will be brought up at boot or cable insertion with whatever settings are in /etc/network/interfaces - this shouldn't need to be done, it seems like a bug in the way thunderbolt networking is handled (i assume this is debian wide but haven't checked).

Huge thanks to @corvy for figuring out a script that should make this much much more reliable for most

create a udev rule to detect for cable insertion using nano /etc/udev/rules.d/10-tb-en.rules with the following content:

ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"

save the file
create the first script referenced above using nano /usr/local/bin/pve-en05.sh and with the follwing content:

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en05"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

save the file and then

create the second script referenced above using nano /usr/local/bin/pve-en06.sh and with the follwing content:

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en06"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

and save the file

make both scripts executable with chmod +x /usr/local/bin/*.sh
run update-initramfs -u -k all to propogate the new link files into initramfs
Reboot (restarting networking, init 1 and init 3 are not good enough, so reboot)

Enabling IP Connectivity

proceed to the next gist

Slow Thunderbolt Performance? Too Many Retries? No traffic? Try this!

verify neighbors can see each other (connectivity troubleshooting)

##3 Install LLDP - this is great to see what nodes can see which.

install lldpctl with apt install lldpd on all 3 nodes
execute lldpctl you should info

make sure iommu is enabled (speed troubleshooting)

if you are having speed issues make sure the following is set on the kernel command line in /etc/default/grub file intel_iommu=on iommu=pt one set be sure to run update-grub and reboot

everyones grub command line is different this is mine because i also have i915 virtualization, if you get this wrong you can break your machine, if you are not doing that you don't need the i915 entries you see below

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" (note if you have more things in your cmd line DO NOT REMOVE them, just add the two intel ones, doesnt matter where.

Pinning the Thunderbolt Driver (speed and retries troubleshooting)

identify you P and E cores by running the following

cat /sys/devices/cpu_core/cpus && cat /sys/devices/cpu_atom/cpus

you should get two lines on an intel system with P and E cores. first line should be your P cores second line should be your E cores

for example on mine:

root@pve1:/etc/pve# cat /sys/devices/cpu_core/cpus && cat /sys/devices/cpu_atom/cpus
0-7
8-15

create a script to apply affinity settings everytime a thunderbolt interface comes up

make a file at /etc/network/if-up.d/thunderbolt-affinity
add the following to it - make sure to replace echo X-Y with whatever the report told you were your performance cores - e.g. echo 0-7

#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo X-Y | tee "/proc/irq/{}/smp_affinity_list"'
fi

save the file - done

Extra Debugging for Thunderbolt

dynamic kernel tracing - adds more info to dmesg, doesn't overhwelm dmesg

I have only tried this on 6.8 kernels, so YMMV If you want more TB messages in dmesg to see why connection might be failing here is how to turn on dynamic tracing

For bootime you will need to add it to the kernel command line by adding thunderbolt.dyndbg=+p to your /etc/default/grub file, running update-grub and rebooting.

To expand the example above"

`GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt thunderbolt.dyndbg=+p"`

Don't forget to run update-grub after saving the change to the grub file.

For runtime debug you can run the following command (it will revert on next boot) so this cant be used to cpature what happens at boot time.

`echo -n 'module thunderbolt =p' > /sys/kernel/debug/dynamic_debug/control`

install tbtools

these tools can be used to inspect your thundebolt system, note they rely on rust to be installedm you must use the rustup script below and not intsall rust by package manager at this time (9/15/24)

apt install pkg-config libudev-dev git curl
curl https://sh.rustup.rs -sSf | sh
git clone https://github.com/intel/tbtools
restart you ssh session
cd tbtools
cargo install --path .

Randymartin1991 commented Aug 8, 2025

Diffrent issue now, after my failed install of the update, i managed to get it fixed. I now however have a strange problem with one of the nodes who did not break, the speed drops from 25gb/s to 8gb/s after 10 a 30 minutes after a reboot of the node. Only way to get the speed back up is to reboot the thing. I have 3 ms-01's. Any ideas? Chat GPT told me it is probably some Trotteling of the pci device, since restarting frr and bringen en05 en 06 down and up does not solve the issue.

Author

scyto commented Aug 8, 2025

Have you tried to kill the frr setup and move to the SDN way as described here? https://gist.github.com/taslabs-net/9da77d302adb9fc3f10942d81f700a05

I am closely monitoring this progress before I will upgrade 🫣

haven't looked at the gist, last time i looked at SDN it used frr under the covers so it isn't moving away from frr, but maybe proxmox integrating it will mean its better aligned with if up events, the key is did the ifupdown ipv6 patch make it in or not, without that issues will occur

Author

scyto commented Aug 8, 2025

Diffrent issue now, after my failed install of the update, i managed to get it fixed. I now however have a strange problem with one of the nodes who did not break, the speed drops from 25gb/s to 8gb/s after 10 a 30 minutes after a reboot of the node. Only way to get the speed back up is to reboot the thing. I have 3 ms-01's. Any ideas? Chat GPT told me it is probably some Trotteling of the pci device, since restarting frr and bringen en05 en 06 down and up does not solve the issue.

no idea, but unplug the TB cables and plug them back in, this will tear down the TB and tb-net stack and bring it backup, if you see the problem resolved that will help us narrow search, you can also turn on ehanced TB debugging to see via dmesg what is happening at the tb layer physically in terms of negotiation, there are some sliding windows on tb negotiation that can impact perf, i do have a note from the tb developer how to modify that (allocate more fixed tb bandwdith to one domain)

Author

scyto commented Aug 8, 2025

I've just upgraded PVE 8 -> 9 and see no issues whatsoever. None of my network devices got renamed, still geting 20-26gbit/s throughput, ceph works. However I need to say that I followed the previous guide with some customizations.
HW: 3x ASUS NUC 13 Pro NUC13ANHI5

Curious did you make sure once ceph was first updated to squid to set the no out flag and then do the upgrade? I could see if you don’t do this that things could really get messed up otherwise. Was also thinking to move all my running vms and lxcs off the node being upgraded first, then upgrading and then if all goes well moving everything back. So if I mess something up on one node, only dealing with that single node initially to get back to happiness.

My ceph was already upgraded to squid (19) prior to upgrade and actually the PVE 8 -> 9 upgrade guide (https://pve.proxmox.com/wiki/Upgrade_from_8_to_9) explicitly says so. Sure, I had set noout flag (ceph osd set noout) and had migrated all VMs to other cluster nodes before upgrade; then I unset noout flag (ceph osd unset noout) as mentioned here - https://pve.proxmox.com/wiki/Ceph_Reef_to_Squid

same here i have been on squid for ages on 8.x and before i upgraded to 9 i made sure i was on latest of everything on every node, then ran the pve8to9 script repeatedly and resolved any issues it noticed, i forgot to fo the noout flag on my first node (oops), didn't seem to cause an issue but i will do it on next node i upgrade

Author

scyto commented Aug 8, 2025 •

edited

Loading

Have you tried to kill the frr setup and move to the SDN way as described here? https://gist.github.com/taslabs-net/9da77d302adb9fc3f10942d81f700a05

I am closely monitoring this progress before I will upgrade 🫣

ok just looked at before starting work, love the use of EOF in it in general, not sure why they are enabling systemd networking - if thats an official proxmox prereq, great, if not i have had experience of using tradtional networking (interfaces file) / systemd / network manager and one never wants both of them enabled at same time, it will end in tears, i never had to explicitly do that and the big one - no i won't be using it, it is a n IPv4 only solution, i run full dual stack and use IPv6 only for cpeh (and i have no plans to change to that because of hpw IPv6 solves certain IPv4 issues in how machines find each other).

I will be moving to SDN (which is FRR when the more advanced modes are used) when it can support IPv6 and that is dependent on changes in upstream ifupdown2 package, or proxmox just choosing to permaently fork it (the patch has been ready for 9mo)

Author

scyto commented Aug 8, 2025 •

edited

Loading

All I have time to do is watch for now unfortunately and likely a proper solution will be in place by the time I do have time freed up next week.

welcome, i am staying away from SDN until i see the needed patch, i am confident the network hang at boot is because at interface up its causing something to run that restarts frr or does some other action that is in my gists or someone suggested in the comments and you implemented

as such it sholuld be incredibly easy to figure out what - you have the service timing out, when it times out it is hard killes and should (may?) write the stack trace of the scheduler.py script at that poin - this should give an idication of what hung if it is not the same thing as I am seeing

it's also possible we don't even the frr restart scripts, remeber this 9.x we know all the network services have been rebuilt and use higher level version, it could be we no longer need to keep restarting frr.service each time we take up and down interfaces.... it might be that we can use the pve network command line to do that instead (instead of restarting frr.service just repapply proxmox networking

Author

scyto commented Aug 8, 2025

i added a reddit PSA here if any of you are redditors and want to mop others who hit the issue (not eveyone looks at these gist comments)

https://www.reddit.com/r/Proxmox/comments/1mkz0jg/psa_upgrade_to_9_and_thunderbolt_mesh_issues/

Randymartin1991 commented Aug 8, 2025

ou can also turn on ehanced TB debugging to see via dmesg what is happening at the tb layer physically in terms of negotiation, there are some sliding windows on tb negotiation that can impact perf, i do have a note from the tb developer how to modify that (allocate more fixed tb bandwdith to one domain)

Did a TB cable pull, however issue is still the same. I am running an ipv4 setup, maybe better to go for ipv6. I never got this stable for a long time. max a few weeks then the speeds drops. But after my failed update, is fails on one specific node within 30 minutes.

ratzofftyoya commented Aug 9, 2025

Very much looking forward to the official IPv6-capable @scyto guide! I am probably one of the first people to try doing this for the first time on a PVE9 install...probably just shouldn't have upgraded before attempting....So on step 1 I was like "/etc/modules is obsolete....hmmmm" :)

Randymartin1991 commented Aug 9, 2025

ou can also turn on ehanced TB debugging to see via dmesg what is happening at the tb layer physically in terms of negotiation, there are some sliding windows on tb negotiation that can impact perf, i do have a note from the tb developer how to modify that (allocate more fixed tb bandwdith to one domain)

Did a TB cable pull, however issue is still the same. I am running an ipv4 setup, maybe better to go for ipv6. I never got this stable for a long time. max a few weeks then the speeds drops. But after my failed update, is fails on one specific node within 30 minutes.

Alright sorry for me spamming this thread with my speed issues. I Think I maybe found the issue. I moved a heavy VM to a diffrent node. Now the speed is constant on 25/gbs and dont have the drop anymore. I think it has something to do with heat. Becauese the VM creates more CPU stress, it generates more heat and perhaps throttling comes into play. Will keep an eye on it today. Thanks again.

ssavkar commented Aug 9, 2025

When you upgrade to Squid, I can do that on each node, and it'll connect to the other nodes running the previous version without any problems? Then once all are running on Squid, then do the upgrade?

hi just to follow up on the Squid update, I am now in the process of updating my other 3-node proxmox cluster to squid from quincy. Just went from quincy to reef and about to swap up to squid. Super easy. If you open three shells for each of your three machines, just follow the directions in the links I gave you, and you should be perfect!

Allistah commented Aug 14, 2025

Thanks @ssavkar - I've got my three nodes all upgraded to Squid. Now I need to update to v9. Who has done that successfully so far?

Randymartin1991 commented Aug 14, 2025

I found the solution to the unreliable speed issues, with my miniforums MS-01. It turns out some BIOS settings. Make sure all ASMP is disabled for all pci devices and force the pci speed to gen4.
I also disabled CPU C-states. Now it is running consistently 25gb/s instead of randomly dropping to 7gb/s.
This took a few weeks of my live, but hey, where here now. :D

ssavkar commented Aug 17, 2025

Thanks @ssavkar - I've got my three nodes all upgraded to Squid. Now I need to update to v9. Who has done that successfully so far?

Very mixed on the need to rush things. I have to think Proxmox 8 is rock solid and working great. Other than updating from PBS 3 to 4, not sure I am going to attempt more at this point. Or if I did, would do it with some test machines, not my working home setup. At least not yet.

Author

scyto commented Aug 17, 2025 •

edited

Loading

@Allistah last weekend i upgraded my 3 cluster nodes, my nas virtualization server node and my pbs from 3 to 4, no issues besides the dont call frr.service restarts via anything that is linked to interfaces coming up, it causes boot hangs. also frr seems much better integrated due to sdn now, so those scripts may not be needed any more.... someone with an ms-01 needs to test that :-). or buy me one, rolf.

Author

scyto commented Aug 17, 2025

I found the solution to the unreliable speed issues, with my miniforums MS-01. It turns out some BIOS settings. Make sure all ASMP is disabled for all pci devices and force the pci speed to gen4. I also disabled CPU C-states. Now it is running consistently 25gb/s instead of randomly dropping to 7gb/s. This took a few weeks of my live, but hey, where here now. :D

that makes much more sense to me as to why some were seeing speed and pinning issues and others not, nice find.

Allistah commented Aug 17, 2025 •

edited

Loading

I found the solution to the unreliable speed issues, with my miniforums MS-01. It turns out some BIOS settings. Make sure all ASMP is disabled for all pci devices and force the pci speed to gen4. I also disabled CPU C-states. Now it is running consistently 25gb/s instead of randomly dropping to 7gb/s. This took a few weeks of my live, but hey, where here now. :D

I wonder if this is something that we should do (disable CPU C-States) on the NUC 13 Pros as well?

corvy commented Aug 17, 2025

As long as this does not disable e-cores I think this is a good idea.

Randymartin1991 commented Aug 17, 2025

I found the solution to the unreliable speed issues, with my miniforums MS-01. It turns out some BIOS settings. Make sure all ASMP is disabled for all pci devices and force the pci speed to gen4. I also disabled CPU C-states. Now it is running consistently 25gb/s instead of randomly dropping to 7gb/s. This took a few weeks of my live, but hey, where here now. :D

I wonder if this is something that we should do (disable CPU C-States) on the NUC 13 Pros as well?

It was recommended to me by out friend, and colleague, Chat AI :P

corvy commented Aug 18, 2025

Can please more people share their experiences with upgrade from 8 to 9? What steps to take, things to watch out for and how to fix/remediate?

Author

scyto commented Aug 18, 2025

I wonder if this is something that we should do (disable CPU C-States) on the NUC 13 Pros as well?

i would only say do this if you are not getting the 26Gbps in iperf3 tests

Author

scyto commented Aug 18, 2025

Can please more people share their experiences with upgrade from 8 to 9? What steps to take, things to watch out for and how to fix/remediate?

follow the instructions carefully
watch out for apt sources issues, they instructions will leave you with some bookworm entries and likely some duplicates - juts remove them
and disable any frr.service restart commands you have tied to to interfaces coming up - which most people here have.....

ssavkar commented Aug 18, 2025

Can please more people share their experiences with upgrade from 8 to 9? What steps to take, things to watch out for and how to fix/remediate?

follow the instructions carefully watch out for apt sources issues, they instructions will leave you with some bookworm entries and likely some duplicates - juts remove them and disable any frr.service restart commands you have tied to to interfaces coming up - which most people here have.....

I am sort of curious, rather than restart commands, I made some fixes on my MS-01 based on something nimro27 commented on back in 11/24, and had the following frr.service.d/dependencies.conf file added. Starting to wonder if this will also create a hang:

[Unit]
Wants=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

Randymartin1991 commented Aug 19, 2025

y frr.service re

Is this no longer needed in proxmox 9, when disabling this under 8.4.11 ceph does not come up automatically after a reboot. So disable the restart scripts, and update?

DomMintago commented Aug 19, 2025

I'm getting crazy high retries on iperf3 regardless of what I tried:

Set smp_affinity
cpupower to performance
Disabled c-states and ASPM
Tried 3 different thunderbolt cables (OWC, CableMatters, Club3D)

I'm using 3x MS-01, any idea what else to try?

Randymartin1991 commented Aug 20, 2025

I'm getting crazy high retries on iperf3 regardless of what I tried:
* Set smp_affinity

* cpupower to performance

* Disabled c-states and ASPM

* Tried 3 different thunderbolt cables (OWC, CableMatters, Club3D)
I'm using 3x MS-01, any idea what else to try?

Did you also force the pci speed to gen4?

DomMintago commented Aug 20, 2025

I'm getting crazy high retries on iperf3 regardless of what I tried:
* Set smp_affinity

* cpupower to performance

* Disabled c-states and ASPM

* Tried 3 different thunderbolt cables (OWC, CableMatters, Club3D)
I'm using 3x MS-01, any idea what else to try?
Did you also force the pci speed to gen4?

Yep, no difference

ssavkar commented Aug 20, 2025

I'm getting crazy high retries on iperf3 regardless of what I tried:
* Set smp_affinity

* cpupower to performance

* Disabled c-states and ASPM

* Tried 3 different thunderbolt cables (OWC, CableMatters, Club3D)
I'm using 3x MS-01, any idea what else to try?
Did you also force the pci speed to gen4?
Yep, no difference

Did you make sure for the affinity script it is executable? You may see earlier that was my issue for one of my ms-01 meshes.

Randymartin1991 commented Aug 20, 2025

These are my retries, also a bit high, but the speed is just fine:
The speed test runs on 10.0.1.1, and pings its local interface so this is why it is so high. But should this drop i know the speed is off again. But did not happen anymore since I tweaked the bios settings.

iperf3 Network Speed Report

Test Timestamp: Wed Aug 20 12:17:33 PM CEST 2025

Running iperf3 test against 10.10.10.1...

Connecting to host 10.10.10.1, port 5201
[ 5] local 10.10.10.1 port 53986 connected to 10.10.10.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 9.02 GBytes 77.5 Gbits/sec 0 1.69 MBytes
[ 5] 1.00-2.00 sec 9.14 GBytes 78.5 Gbits/sec 0 1.87 MBytes
[ 5] 2.00-3.00 sec 9.27 GBytes 79.6 Gbits/sec 0 2.00 MBytes
[ 5] 3.00-4.00 sec 9.21 GBytes 79.1 Gbits/sec 0 2.31 MBytes
[ 5] 4.00-5.00 sec 8.92 GBytes 76.6 Gbits/sec 0 2.62 MBytes

[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.00 sec 45.6 GBytes 78.3 Gbits/sec 0 sender
[ 5] 0.00-5.00 sec 45.6 GBytes 78.3 Gbits/sec receiver

iperf Done.

Host: 10.10.10.1
Speed: 78.3 Gbits/sec
Status: ✅ PASS

Running iperf3 test against 10.10.10.2...

Connecting to host 10.10.10.2, port 5201
[ 5] local 10.10.10.1 port 41306 connected to 10.10.10.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.16 GBytes 18.5 Gbits/sec 141 3.37 MBytes
[ 5] 1.00-2.00 sec 2.50 GBytes 21.5 Gbits/sec 247 3.56 MBytes
[ 5] 2.00-3.00 sec 2.10 GBytes 18.0 Gbits/sec 116 3.31 MBytes
[ 5] 3.00-4.00 sec 2.64 GBytes 22.6 Gbits/sec 273 2.37 MBytes
[ 5] 4.00-5.00 sec 2.34 GBytes 20.1 Gbits/sec 151 3.62 MBytes

[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.00 sec 11.7 GBytes 20.2 Gbits/sec 928 sender
[ 5] 0.00-5.00 sec 11.7 GBytes 20.1 Gbits/sec receiver

iperf Done.

Host: 10.10.10.2
Speed: 20.2 Gbits/sec
Status: ✅ PASS

Running iperf3 test against 10.10.10.3...

Connecting to host 10.10.10.3, port 5201
[ 5] local 10.10.10.1 port 53048 connected to 10.10.10.3 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.95 GBytes 25.3 Gbits/sec 82 3.62 MBytes
[ 5] 1.00-2.00 sec 2.93 GBytes 25.2 Gbits/sec 35 4.37 MBytes
[ 5] 2.00-3.00 sec 2.99 GBytes 25.6 Gbits/sec 96 2.31 MBytes
[ 5] 3.00-4.00 sec 2.92 GBytes 25.0 Gbits/sec 181 1.56 MBytes
[ 5] 4.00-5.00 sec 2.24 GBytes 19.2 Gbits/sec 203 3.31 MBytes

[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.00 sec 14.0 GBytes 24.1 Gbits/sec 597 sender
[ 5] 0.00-5.00 sec 14.0 GBytes 24.1 Gbits/sec receiver

iperf Done.

Host: 10.10.10.3
Speed: 24.1 Gbits/sec
Status: ✅ PASS

DamianRyse commented Aug 20, 2025

What I noticed in regards of transfer speed and retries is, Turbo Mode must be turned on in order to get decent results. Although, Turbo Mode only affects the receiving device.

iperf3 test WITHOUT turbo mode

[  5] local 10.0.0.1 port 40056 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   682 MBytes  5.71 Gbits/sec  244   1.31 MBytes
[  5]   1.00-2.00   sec  1.09 GBytes  9.39 Gbits/sec  408   1.31 MBytes
[  5]   2.00-3.00   sec  1.85 GBytes  15.9 Gbits/sec  577   2.19 MBytes
[  5]   3.00-4.00   sec  1.17 GBytes  10.0 Gbits/sec  446   1.25 MBytes
[  5]   4.00-5.00   sec   985 MBytes  8.26 Gbits/sec  379   1.31 MBytes
[  5]   5.00-6.00   sec  1.03 GBytes  8.80 Gbits/sec  323   1.31 MBytes
[  5]   6.00-7.00   sec  1.08 GBytes  9.29 Gbits/sec  402   1.44 MBytes
[  5]   7.00-8.00   sec  1.23 GBytes  10.6 Gbits/sec  444   1.12 MBytes
[  5]   8.00-9.00   sec  1.65 GBytes  14.2 Gbits/sec  550   1.06 MBytes
[  5]   9.00-10.00  sec  1.46 GBytes  12.5 Gbits/sec  457    959 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  12.2 GBytes  10.5 Gbits/sec  4230            sender
[  5]   0.00-10.00  sec  12.2 GBytes  10.5 Gbits/sec                  receiver

iperf3 test WITH turbo mode enabled

[  5] local 10.0.0.1 port 39930 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.96 GBytes  25.4 Gbits/sec   78   3.31 MBytes
[  5]   1.00-2.00   sec  3.06 GBytes  26.3 Gbits/sec   15   3.31 MBytes
[  5]   2.00-3.00   sec  3.06 GBytes  26.3 Gbits/sec   16   3.31 MBytes
[  5]   3.00-4.00   sec  3.06 GBytes  26.3 Gbits/sec   14   3.31 MBytes
[  5]   4.00-5.00   sec  3.03 GBytes  26.1 Gbits/sec   20   3.31 MBytes
[  5]   5.00-6.00   sec  3.05 GBytes  26.2 Gbits/sec   13   3.31 MBytes
[  5]   6.00-7.00   sec  3.05 GBytes  26.2 Gbits/sec   19   3.81 MBytes
[  5]   7.00-8.00   sec  2.38 GBytes  20.4 Gbits/sec   20   3.50 MBytes
[  5]   8.00-9.00   sec  3.06 GBytes  26.3 Gbits/sec   28   3.50 MBytes
[  5]   9.00-10.00  sec  3.07 GBytes  26.3 Gbits/sec   14   3.50 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  29.8 GBytes  25.6 Gbits/sec  237            sender
[  5]   0.00-10.00  sec  29.8 GBytes  25.6 Gbits/sec                  receiver

The reason for this (as far as I figured out) is that the CPU cannot process the incoming data fast enough without Turbo Mode and the ksoftirqd interrupts are getting higher and higher. In a process monitor like top we can see, that the ksoftirqd process takes up to 99% CPU which then results in massive packet drops/retries.

Unfortunately, in my setup the power consumption increases by up to 100W (worst case scenario) when Turbo Mode is enabled on both of my MS-01.

scyto/proxmox-tb-net.md

Thunderbolt Networking

you wil need proxmox kernel 6.2.16-14-pve or higher.

Load Kernel Modules

Prepare /etc/network/interfaces

Create entries to prepopulate gui with reminder

Rename Thunderbolt Connections

Set Interfaces to UP on reboots and cable insertions

Enabling IP Connectivity

Slow Thunderbolt Performance? Too Many Retries? No traffic? Try this!

verify neighbors can see each other (connectivity troubleshooting)

make sure iommu is enabled (speed troubleshooting)

Pinning the Thunderbolt Driver (speed and retries troubleshooting)

identify you P and E cores by running the following

create a script to apply affinity settings everytime a thunderbolt interface comes up

Extra Debugging for Thunderbolt

dynamic kernel tracing - adds more info to dmesg, doesn't overhwelm dmesg

install tbtools

Randymartin1991 commented Aug 8, 2025

Uh oh!

scyto commented Aug 8, 2025

Uh oh!

scyto commented Aug 8, 2025

Uh oh!

scyto commented Aug 8, 2025

Uh oh!

scyto commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scyto commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scyto commented Aug 8, 2025

Uh oh!

Randymartin1991 commented Aug 8, 2025

Uh oh!

ratzofftyoya commented Aug 9, 2025

Uh oh!

Randymartin1991 commented Aug 9, 2025

Uh oh!

ssavkar commented Aug 9, 2025

Uh oh!

Allistah commented Aug 14, 2025

Uh oh!

Randymartin1991 commented Aug 14, 2025

Uh oh!

ssavkar commented Aug 17, 2025

Uh oh!

scyto commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scyto commented Aug 17, 2025

Uh oh!

Allistah commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

corvy commented Aug 17, 2025

Uh oh!

Randymartin1991 commented Aug 17, 2025

Uh oh!

corvy commented Aug 18, 2025

Uh oh!

scyto commented Aug 18, 2025

Uh oh!

scyto commented Aug 18, 2025

Uh oh!

ssavkar commented Aug 18, 2025

Uh oh!

Randymartin1991 commented Aug 19, 2025

Uh oh!

DomMintago commented Aug 19, 2025

Uh oh!

Randymartin1991 commented Aug 20, 2025

Uh oh!

DomMintago commented Aug 20, 2025

Uh oh!

ssavkar commented Aug 20, 2025

Uh oh!

Randymartin1991 commented Aug 20, 2025

Running iperf3 test against 10.10.10.1...

scyto commented Aug 8, 2025 •

edited

Loading

scyto commented Aug 8, 2025 •

edited

Loading

scyto commented Aug 17, 2025 •

edited

Loading

Allistah commented Aug 17, 2025 •

edited

Loading