Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active April 22, 2025 17:30
Show Gist options
  • Save scyto/4c664734535da122f4ab2951b22b2085 to your computer and use it in GitHub Desktop.
Save scyto/4c664734535da122f4ab2951b22b2085 to your computer and use it in GitHub Desktop.

THIS GIST IS NOW DEPRECATED NEW ONE IS AVAILABLE HERE I WONT BE UPDATING THIS ONE OR REPLYING TO COMMENTS ON THIS ONE (COMMENTS NOW DISABLED).

Enable Dual Stack (IPv4 and IPv6) OpenFabric Routing

this gist is part of this series

This assumes you are running Proxmox 8.2 and that the line source /etc/network/interfaces.d/* is at the end of the interfaces file (this is automatically added to both new and upgraded installations of Proxmox 8.2).

This changes the previous file design thanks to @NRGNet for the suggestions to move thunderbolt settings to a file in /etc/network/interfaces.d it makes the system much more reliable in general, more maintainable esp for folks using IPv4 on the private cluster network (i still recommend the use of the IPv6 FC00 network you will see in these docs)

This will result in an IPv4 and IPv6 routable mesh network that can survive any one node failure or any one cable failure. Alls the steps in this section must be performed on each node

NOTES on Dual Stack

I have included this for completeness, i only run the FC00:: IPv6 network as ceph does not support dual stack, i strongly recommend you consider only using IPv6. For example for ceph do not dual stack - either use IPv4 or IPv6 addressees for all the monitors, MDS and daemons - despite the docs implying it is ok my findings on quincy are is it is funky....

Defining thunderbolt network

Create a new file using nano /etc/network/interfaces.d/thunderbolt and populate with the following Remember X should match you node number, so for example 1,2 or 3.

auto lo:0
iface lo:0 inet static
        address 10.0.0.8X/32
        
auto lo:6
iface lo:6 inet static
        address fc00::8X/128
        
allow-hotplug en05
iface en05 inet manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

Save file, repeat on each node.

Enable IPv4 and IPv6 forwarding

  1. use nano /etc/sysctl.conf to open the file
  2. uncomment #net.ipv6.conf.all.forwarding=1 (remove the # symbol)
  3. uncomment #net.ipv4.ip_forward=1 (remove the # symbol)
  4. save the file
  5. issue reboot now for a complete reboot

FRR Setup

Install FRR

Install Free Range Routing (FRR) apt install frr

Enable the fabricd daemon

  1. edit the frr daemons file (nano /etc/frr/daemons) to change fabricd=no to fabricd=yes
  2. save the file
  3. restart the service with systemctl restart frr

Mitigate FRR Timing Issues at Boot

Add post-up command to /etc/network/interfaces

sudo nano /etc/network/interfaces

Add post-up /usr/bin/systemctl restart frr.serviceas the last line in the file (this should go after the line that starts source)

NOTE for Minisforum MS-01 users

make the post-up line above read post-up sleep 5 && /usr/bin/systemctl restart frr.service instead this has been verified to be required due to timing issues see on those units, exact cause unknown, may be needed on other hardware too.

Configure OpenFabric (perforn on all nodes)

  1. enter the FRR shell with vtysh
  2. optionally show the current config with show running-config
  3. enter the configure mode with configure
  4. Apply the bellow configuration (it is possible to cut and paste this into the shell instead of typing it manually, you may need to press return to set the last !. Also check there were no errors in repsonse to the paste text.).

Note: the X should be the number of the node you are working on, as an example - 0.0.0.1, 0.0.0.2 or 0.0.0.3.

ip forwarding
ipv6 forwarding
!
interface en05
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface en06
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface lo
ip router openfabric 1
ipv6 router openfabric 1
openfabric passive
exit
!
router openfabric 1
net 49.0000.0000.000X.00
exit
!

  1. you may need to pres return after the last ! to get to a new line - if so do this

  2. exit the configure mode with the command end

  3. save the configu with write memory

  4. show the configure applied correctly with show running-config - note the order of the items will be different to how you entered them and thats ok. (If you made a mistake i found the easiest way was to edt /etc/frr/frr.conf - but be careful if you do that.)

  5. use the command exit to leave setup

  6. repeat steps 1 to 9 on the other 3 nodes

  7. once you have configured all 3 nodes issue the command vtysh -c "show openfabric topology" if you did everything right you will see:

Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
pve1                                                                  
10.0.0.81/32         IP internal  0                                     pve1(4)
pve2                 TE-IS        10     pve2                 en06      pve1(4)
pve3                 TE-IS        10     pve3                 en05      pve1(4)
10.0.0.82/32         IP TE        20     pve2                 en06      pve2(4)
10.0.0.83/32         IP TE        20     pve3                 en05      pve3(4)

IS-IS paths to level-2 routers that speak IPv6
Vertex               Type         Metric Next-Hop             Interface Parent
pve1                                                                  
fc00::81/128         IP6 internal 0                                     pve1(4)
pve2                 TE-IS        10     pve2                 en06      pve1(4)
pve3                 TE-IS        10     pve3                 en05      pve1(4)
fc00::82/128         IP6 internal 20     pve2                 en06      pve2(4)
fc00::83/128         IP6 internal 20     pve3                 en05      pve3(4)

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent

Now you should be in a place to ping each node from evey node across the thunderbolt mesh using IPv4 or IPv6 as you see fit.

THIS GIST IS NOW DEPRECATED NEW ONE IS AVAILABLE HERE I WONT BE UPDATING THIS ONE OR REPLYING TO COMMENTS ON THIS ONE (COMMENTS NOW DISABLED).

@yet-an-other
Copy link

So it's possible to have a 2-node cluster (with QDevice) work with Ceph? I read in every guide that Ceph required at least 3 nodes.

Absolutely. Although this is highly unrecommended, it certainly works. The trick is that the UI doesn't allow you to create a monitor with a minimum of 1 node for read and write operations, but it's possible to change it later, after creation. You can create a monitor with default settings and immediately afterward adjust it to two nodes normal and 1 node minimum. It shows a warning but works.

Moreover, it's even possible to create a Ceph cluster with only one disk and add the second disk on the second node later. I did this because I had some data on one node that I wanted to move to the Ceph pool. So I created a pool with one disk first, then moved the data to the pool from the disk on the second node, and added the second disk afterward.

@yet-an-other
Copy link

For those of you that have this cluster working correctly, have you updated Proxmox to the latest version and if so, has there been any issues with it breaking the thunderbolt network?

8.3.2 works fine with both iommu and CPU affinity configured.

@alexdelprete
Copy link

The trick is that the UI doesn't allow you to create a monitor with a minimum of 1 node for read and write operations, but it's possible to change it later, after creation.

but if it doesn't allow you to create it, how do you create it first, in order to change it later?

So I created a pool with one disk first, then moved the data to the pool from the disk on the second node, and added the second disk afterward.

good to know, because I will have the same issue. How do you move data from the other node when you have the ceph volume only on the other?

@theTov
Copy link

theTov commented Jan 17, 2025

I had some frr boot timing issues with my Minisforum MS-01 as described in the gist. However adding the following line to the /etc/network/interfaces was not working reliably for me. sleep 5 && post-up /usr/bin/systemctl restart frr.service Sometimes according to dmesg it took quite some time until the interfaces en05 and en06 where correctly set up. This for some reason lead to only ipv6 routing being initialized but not ipv4 (not sure why here).
So I added the following file /etc/systemd/system/frr.service.d/dependencies.conf with this content:

[Unit]
BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This ensures that frr only starts after en05 and en06 are fully initialized. This worked great for me and the post-up command is not needed anymore in the interfaces file. I thought I would share this if someone else had similar problems. I would also be happy to hear your thoughts on this solution.

@nimro27 Thank you so much! This solved it for me on my 3 node MS-01 cluster!

When I add this file the frr.service does not even start anymore, not on boot and not manually. Any idea?

@FrankSchoene @nimro27 Just a warning, when I attempted to use this approach (dependencies.conf) is caused a couple problems. It would occasionally prevent en05/06 from coming up during boot (logging dependency errors), but more importantly it would also cause the frr service on all other nodes to shutdown when the current node was rebooting. In my case, I had to remove the dependencies.conf and go back to using a post-up with sleep 10. One minor difference in my case, I used a script in /etc/network/if-up.d/ instead of explicitly adding post-up to the interfaces file. Everything seems to be working well now, and survives reboots of any node. I'm running 3 MS-01's.

Thanks for the warning, great catch! I did some more tests and changed the BindsTo to Wants in the dependencies.conf this solved the case that frr is shutdown on other nodes if one goes down.

[Unit]
Wants=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

However I could not replicate the case where en05/06 would not come up during boot. Could you explain the logging dependency error a bit more? Is it related to the shutdown issue with BindsTo?

I had some frr boot timing issues with my Minisforum MS-01 as described in the gist. However adding the following line to the /etc/network/interfaces was not working reliably for me. sleep 5 && post-up /usr/bin/systemctl restart frr.service Sometimes according to dmesg it took quite some time until the interfaces en05 and en06 where correctly set up. This for some reason lead to only ipv6 routing being initialized but not ipv4 (not sure why here).
So I added the following file /etc/systemd/system/frr.service.d/dependencies.conf with this content:

[Unit]
BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This ensures that frr only starts after en05 and en06 are fully initialized. This worked great for me and the post-up command is not needed anymore in the interfaces file. I thought I would share this if someone else had similar problems. I would also be happy to hear your thoughts on this solution.

@nimro27 Thank you so much! This solved it for me on my 3 node MS-01 cluster!

When I add this file the frr.service does not even start anymore, not on boot and not manually. Any idea?

@FrankSchoene @nimro27 Just a warning, when I attempted to use this approach (dependencies.conf) is caused a couple problems. It would occasionally prevent en05/06 from coming up during boot (logging dependency errors), but more importantly it would also cause the frr service on all other nodes to shutdown when the current node was rebooting. In my case, I had to remove the dependencies.conf and go back to using a post-up with sleep 10. One minor difference in my case, I used a script in /etc/network/if-up.d/ instead of explicitly adding post-up to the interfaces file. Everything seems to be working well now, and survives reboots of any node. I'm running 3 MS-01's.

Another approach I found that worked to troubleshoot issues with FRRouting not starting at boot was to change the frr.service config for FRR service to wait until the network is fully loaded.

Issue: I could find my devices when running vtysh -c "show openfabric topology" and checked my network logs and found "Node2 networking[1170509]: error: /etc/network/interfaces: line31: error processing line 'post-up sleep 5 && /usr/bin/systemctl restart frr.service'" in my logs

Solutions: Change the frr.service config for FRR service to wait until the network is fully loaded using "network-online.target"

How:
nano /lib/systemd/system/frr.service and update to use network-online.target and change Wants= and After= instead of network.target it would look something like this after changing :

[Unit]
Description=FRRouting
Documentation=https://frrouting.readthedocs.io/en/latest/setup.html
Wants=network-online.target
After=network-online.target systemd-sysctl.service
OnFailure=heartbeat-failed@%n

Save and exit the file. remove or commet out sleep 5 && post-up /usr/bin/systemctl restart frr.service
and then restart network services "systemctl restart networking" and then run "journalctl -xe | grep networking" to see if it worked. I have done this on 2 of my nodes and the MS-01 is running the BindsTo solution above. I hope this helps someone, been at this for 3 days and finally got my mesh to work - 2x thunderbolt and 1 on SFP+

@Carrrot
Copy link

Carrrot commented Jan 21, 2025

I managed to set up the 3 machines with Thunderbolt as the migration network - I think managed to get about 8Gb / sec transfer when moving a VM between them. Not bad for machines that are over 10 years old! This in itself is VERY helpful!!

I did not get the ip forwarding to appear as visible in the vtysh 'runining config'

I tried unplugging my main ethernet cable to one of the servers and I was hoping the internet connection would automatically switch over to using the Thunderbolt connection - but this did not happen unfortunately.

Hi @ricksteruk,

i am trying to get this going between two Mac Minis A1347, therefore also Thunderbolt 2. Since you seemingly made it happen on macs as well, you might have a clue whats going on here:

Both machines do see each other via LLDP as neighbour but only if i force LLDPD to emit LLDP-MED frames, setting the class option -M 4 aka Network Connectivity Device. This is also reliable, if i disconnect the cable on either machine and reconnect, the LLDP comes up and along with it the en05 network devices, with status 'up' (configured as per @scyto 's guide).

But as soon as i start the frr service, LLDP looses connection. This also freezes the frr daemon on one machine till i physically interrupt the cable connection by pulling cable or by issuing ifdown en05 on the opposite machine on which frr daemon is not frozen.

For reference i added the output of get-thunderbolt-config.sh of both machines below.

Host pve1

=========================================
Thunderbolt Mesh Network Config Info Tool
=========================================

-----------------------------------------
Kernel Version
-----------------------------------------
Linux pve 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 GNU/Linux

-----------------------------------------
File: /etc/network/interfaces
-----------------------------------------
auto lo
iface en05 inet manual
iface lo inet loopback
iface enp2s0f0 inet manual
auto vmbr0
iface vmbr0 inet static
	address 10.1.1.121/24
	gateway 10.1.1.1
	bridge-ports enp2s0f0
	bridge-stp off
	bridge-fd 0
post-up /usr/bin/systemctl restart frr.service
source /etc/network/interfaces.d/*

-----------------------------------------
File: /etc/network/interfaces.d/thunderbolt
-----------------------------------------
auto lo
iface lo inet static
        address 10.0.0.81/32

auto lo:6
iface lo:6 inet static
        address fc00::81/128

allow-hotplug en05
iface en05 inet manual
        mtu 65520

----------------------------------------
File: /usr/local/bin/pve-en05.sh
-rwxr-xr-x 1 root root 154 Jan  9 22:51 /usr/local/bin/pve-en05.sh
-----------------------------------------
/usr/sbin/ifup en05

----------------------------------------
File: /usr/local/bin/pve-en06.sh
ls: cannot access '/usr/local/bin/pve-en06.sh': No such file or directory
-----------------------------------------
grep: /usr/local/bin/pve-en06.sh: No such file or directory

-----------------------------------------
File: /etc/modules
-----------------------------------------
thunderbolt

-----------------------------------------
File: /etc/systemd/network/00-thunderbolt0.link
-----------------------------------------
[Match]
Path=pci-0000:08:00.0
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05

-----------------------------------------
File: /etc/systemd/network/00-thunderbolt1.link
-----------------------------------------
grep: /etc/systemd/network/00-thunderbolt1.link: No such file or directory

-----------------------------------------
File: /etc/sysctl.conf
-----------------------------------------
et.ipv4.ip_forward=1
et.ipv6.conf.all.forwarding=1

-----------------------------------------
File: /etc/udev/rules.d/10-tb-en.rules
-----------------------------------------
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"

-----------------------------------------
File: /etc/frr/frr.conf
-----------------------------------------
frr version 10.2.1
frr defaults traditional
hostname pve
log syslog errors
service integrated-vtysh-config
!
interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0000.0000.0001.00
 fabric-tier 0
exit
!

-----------------------------------------
File: /etc/frr/daemons
-----------------------------------------
bgpd=no
ospfd=no
ospf6d=no
ripd=no
ripngd=no
isisd=no
pimd=no
pim6d=no
ldpd=no
nhrpd=no
eigrpd=no
babeld=no
sharpd=no
pbrd=no
bfdd=no
fabricd=yes
vrrpd=no
pathd=no
vtysh_enable=yes
zebra_options="  -A 127.0.0.1 -s 90000000"
mgmtd_options="  -A 127.0.0.1"
bgpd_options="   -A 127.0.0.1"
ospfd_options="  -A 127.0.0.1"
ospf6d_options=" -A ::1"
ripd_options="   -A 127.0.0.1"
ripngd_options=" -A ::1"
isisd_options="  -A 127.0.0.1"
pimd_options="   -A 127.0.0.1"
pim6d_options="  -A ::1"
ldpd_options="   -A 127.0.0.1"
nhrpd_options="  -A 127.0.0.1"
eigrpd_options=" -A 127.0.0.1"
babeld_options=" -A 127.0.0.1"
sharpd_options=" -A 127.0.0.1"
pbrd_options="   -A 127.0.0.1"
staticd_options="-A 127.0.0.1"
bfdd_options="   -A 127.0.0.1"
fabricd_options="-A 127.0.0.1"
vrrpd_options="  -A 127.0.0.1"
pathd_options="  -A 127.0.0.1"

-----------------------------------------
Command: vtysh -c "show openfabric topology"
-----------------------------------------
Area 1:
IS-IS paths to level-2 routers that speak IP
 Vertex        Type         Metric  Next-Hop  Interface  Parent
 ----------------------------------------------------------------
 pve
 10.0.0.81/32  IP internal  0                            pve(4)


IS-IS paths to level-2 routers that speak IPv6
 Vertex        Type          Metric  Next-Hop  Interface  Parent
 -----------------------------------------------------------------
 pve
 fc00::81/128  IP6 internal  0                            pve(4)


IS-IS paths to level-2 routers with hop-by-hop metric
 Vertex  Type  Metric  Next-Hop  Interface  Parent




-----------------------------------------
Command: vtysh -c "show running-config"
-----------------------------------------
Building configuration...

Current configuration:
!
frr version 10.2.1
frr defaults traditional
hostname pve
log syslog errors
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0000.0000.0001.00
 fabric-tier 0
exit
!
end

----------------------------------------
File: /etc/network/if-up.d/thunderbolt-affinity
ls: cannot access '/etc/network/if-up.d/thunderbolt-affinity': No such file or directory
-----------------------------------------

Host pve2

=========================================
Thunderbolt Mesh Network Config Info Tool
=========================================

-----------------------------------------
Kernel Version
-----------------------------------------
Linux pve2 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 GNU/Linux

-----------------------------------------
File: /etc/network/interfaces
-----------------------------------------
auto lo
iface en05 inet manual
iface lo inet loopback
iface enp1s0f0 inet manual
auto vmbr0
iface vmbr0 inet static
	address 10.1.1.122/24
	gateway 10.1.1.1
	bridge-ports enp1s0f0
	bridge-stp off
	bridge-fd 0
post-up /usr/bin/systemctl restart frr.service
source /etc/network/interfaces.d/*

-----------------------------------------
File: /etc/network/interfaces.d/thunderbolt
-----------------------------------------
auto lo
iface lo inet static
        address 10.0.0.82/32

auto lo:6
iface lo:6 inet static
        address fc00::82/128

allow-hotplug en05
iface en05 inet manual
        mtu 65520

----------------------------------------
File: /usr/local/bin/pve-en05.sh
-rwxr-xr-x 1 root root 154 Jan  9 22:51 /usr/local/bin/pve-en05.sh
-----------------------------------------
/usr/sbin/ifup en05

----------------------------------------
File: /usr/local/bin/pve-en06.sh
ls: cannot access '/usr/local/bin/pve-en06.sh': No such file or directory
-----------------------------------------
grep: /usr/local/bin/pve-en06.sh: No such file or directory

-----------------------------------------
File: /etc/modules
-----------------------------------------
thunderbolt

-----------------------------------------
File: /etc/systemd/network/00-thunderbolt0.link
-----------------------------------------
[Match]
Path=pci-0000:06:00.0
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05

-----------------------------------------
File: /etc/systemd/network/00-thunderbolt1.link
-----------------------------------------
grep: /etc/systemd/network/00-thunderbolt1.link: No such file or directory

-----------------------------------------
File: /etc/sysctl.conf
-----------------------------------------
net.ipv4.ip_forward=1
net.ipv6.conf.all.forwarding=1

-----------------------------------------
File: /etc/udev/rules.d/10-tb-en.rules
-----------------------------------------
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"

-----------------------------------------
File: /etc/frr/frr.conf
-----------------------------------------
frr version 10.2.1
frr defaults traditional
hostname pve2
log syslog errors
service integrated-vtysh-config
!
interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0000.0000.0002.00
 fabric-tier 0
exit
!

-----------------------------------------
File: /etc/frr/daemons
-----------------------------------------
bgpd=no
ospfd=no
ospf6d=no
ripd=no
ripngd=no
isisd=no
pimd=no
pim6d=no
ldpd=no
nhrpd=no
eigrpd=no
babeld=no
sharpd=no
pbrd=no
bfdd=no
fabricd=yes
vrrpd=no
pathd=no
vtysh_enable=yes
zebra_options="  -A 127.0.0.1 -s 90000000"
mgmtd_options="  -A 127.0.0.1"
bgpd_options="   -A 127.0.0.1"
ospfd_options="  -A 127.0.0.1"
ospf6d_options=" -A ::1"
ripd_options="   -A 127.0.0.1"
ripngd_options=" -A ::1"
isisd_options="  -A 127.0.0.1"
pimd_options="   -A 127.0.0.1"
pim6d_options="  -A ::1"
ldpd_options="   -A 127.0.0.1"
nhrpd_options="  -A 127.0.0.1"
eigrpd_options=" -A 127.0.0.1"
babeld_options=" -A 127.0.0.1"
sharpd_options=" -A 127.0.0.1"
pbrd_options="   -A 127.0.0.1"
staticd_options="-A 127.0.0.1"
bfdd_options="   -A 127.0.0.1"
fabricd_options="-A 127.0.0.1"
vrrpd_options="  -A 127.0.0.1"
pathd_options="  -A 127.0.0.1"

-----------------------------------------
Command: vtysh -c "show openfabric topology"
-----------------------------------------
Area 1:
IS-IS paths to level-2 routers that speak IP
 Vertex        Type         Metric  Next-Hop  Interface  Parent
 -----------------------------------------------------------------
 pve2
 10.0.0.82/32  IP internal  0                            pve2(4)


IS-IS paths to level-2 routers that speak IPv6
 Vertex        Type          Metric  Next-Hop  Interface  Parent
 ------------------------------------------------------------------
 pve2
 fc00::82/128  IP6 internal  0                            pve2(4)


IS-IS paths to level-2 routers with hop-by-hop metric
 Vertex  Type  Metric  Next-Hop  Interface  Parent




-----------------------------------------
Command: vtysh -c "show running-config"
-----------------------------------------
Building configuration...

Current configuration:
!
frr version 10.2.1
frr defaults traditional
hostname pve2
log syslog errors
service integrated-vtysh-config
!
interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0000.0000.0002.00
 fabric-tier 0
exit
!
end

----------------------------------------
File: /etc/network/if-up.d/thunderbolt-affinity
ls: cannot access '/etc/network/if-up.d/thunderbolt-affinity': No such file or directory
-----------------------------------------

@corvy
Copy link

corvy commented Feb 10, 2025

Deleted - as this was not correct - see my updated and working config here

@vlad-lucian86
Copy link

Hi @scyto

Thank you very much for your work and to everyone here contributing in the comments. This helped me a lot.

I want to help out and give back, so I want to point out a few things for those trying this on MS-01 and maybe help others too.

You have a mistake in the post-up line for MS-01. It is not sleep 5 && post-up ..... as post-up is the marker to trigger the command while loading settings from the /etc/network/interfaces file. To make it work it should be:
post-up sleep 5 && /usr/bin/systemctl restart frr.service

Otherwise it errors out as it does not know what "sleep" is . That's why people are reporting it is not working on MS01.

IPv6 is working perfectly, but I gave up on IPv4, as frr does not start, might work with the solutions in the comment, but I might not have implemented them correctly. Who knows.

Also, I just added the thunderbolt interfaces in the /etc/network/interfaces file directly, as for some reason, my install on Proxmox 8.3 does not load the sources ..... at least it does not seam too. Too tired to see why.
Last but not least, a huge thank you to @Allistah for the thunderbolt-affinity solution, it worked wonders. For MS-01, with i9 13900H, as there are 6 cores and 12 threads, it shoudl have 0-11 in it, but works very well with 0-5 using half of the P core threads.

I think this was my first post on github, and it is 3 in the morning, so please excuse the rudimentary formating. Just wanted to help out so I don't forget in the morning.

Cheers,
Vlad

@tisayama
Copy link

Hi @scyto

Based on my testing in my environment, it appears that specifying a sequence of shell commands with post-up is not possible.
It would be better to place a shell script in /etc/network/if-up.d/ and have it executed when en05 and en06 are brought up.

nano /etc/network/if-up.d/frr
#!/bin/bash

if [[ $IFACE == "en05" ]] || [[ $IFACE == "en06" ]]; then
 sleep 5 && /usr/bin/systemctl restart frr.service
fi
chmod +x /etc/network/if-up.d/frr

With this method, IPv4 can now be used stably.


Also, if anyone set FRR rules via the command line, when using Proxmox VE SDN, applying the SDN seems to overwrite the Thunderbolt Networking rules set in this article, effectively removing them.

If you write the FRR configuration for Thunderbolt Networking into the following file, it will merge with the SDN apply.

nano /etc/frr/frr.conf.local

@IndianaJoe1216
Copy link

Does anyone have a consistently functional FRR config using 3 MS-01 boxes? I have implemented several of the changes in the comments here and am still having issues getting en05 and en06 to come up after a reboot.

@eriklarsen-bidbax
Copy link

@scyto I love this :) I have 3 intel Nucs myself (13th Gen) and wanted to replicate your setup. I ordered 3 TB4 cables from Cable Matters and followed every step in the guide. I get Node 1 to talk to 3, but none of them can talk to 2. I rechecked cables and udevadm monitor gave zero response when inserting and removing cables (I checked all the configs and they are good across all 3).
I checked my ifupdown logs and came a cross something really weird - right after the thunderbolt file is processed it reports that it is deleting lo and vmbr as they are blacklisted (?) and then exits. I'm not sure, but this can either be the reason that I get no reaction from udevadm monitor or caused by the tb ports being broken? Have you seen this before? I have no experience with blacklisted interfaces, and know for sure that I never blacklisted anything other than the nvidia driver...

2025-03-13 20:18:28,428: MainThread: ifupdown.networkInterfaces: networkinterfaces.py:164:process_source(): debug: processing sourced line ..'source /etc/network/interfaces.d/*'
2025-03-13 20:18:28,428: MainThread: ifupdown.networkInterfaces: networkinterfaces.py:506:read_file(): info: processing interfaces file /etc/network/interfaces.d/thunderbolt
2025-03-13 20:18:28,428: MainThread: ifupdown.bridge: modulebase.py:257:parse_port_list(): debug: vmbr0: evaluating port expr '['enp86s0']'
2025-03-13 20:18:28,429: MainThread: ifupdown: ifupdownmain.py:859:populate_dependency_info(): debug: populate_dependency_info: deleting blacklisted interface lo
2025-03-13 20:18:28,429: MainThread: ifupdown: ifupdownmain.py:859:populate_dependency_info(): debug: populate_dependency_info: deleting blacklisted interface vmbr0
2025-03-13 20:18:28,437: MainThread: ifupdown2: log.py:373:write(): info: exit status 0

@scyto
Copy link
Author

scyto commented Apr 18, 2025

Hi @scyto

Based on my testing in my environment, it appears that specifying a sequence of shell commands with post-up is not possible. It would be better to place a shell script in /etc/network/if-up.d/ and have it executed when en05 and en06 are brought up.

nano /etc/network/if-up.d/frr
#!/bin/bash

if [[ $IFACE == "en05" ]] || [[ $IFACE == "en06" ]]; then
 sleep 5 && /usr/bin/systemctl restart frr.service
fi
chmod +x /etc/network/if-up.d/frr

With this method, IPv4 can now be used stably.

Also, if anyone set FRR rules via the command line, when using Proxmox VE SDN, applying the SDN seems to overwrite the Thunderbolt Networking rules set in this article, effectively removing them.

If you write the FRR configuration for Thunderbolt Networking into the following file, it will merge with the SDN apply.

nano /etc/frr/frr.conf.local

nice, I implemented on one node and issue an ifup -a and seems to work, do you know why this works and the old method either now has issues / issues for some?

also for that last part SDN are you saying copy /etc/frr/frr.conf to /etc/frr/frr.conf local, or instead are you saying to call it this file in the first place?

ok i found a proxmox forum post that explaing the .local, seems to be a proxmox specific thing - for now i copied my frr.conf to frr.conf.local on each node - do you know the .local applies even if one never touches SDN (i.e. I should update the instructions to allway say write it to that file)?

(also i see that proxmox SDN can support FRR - maybe there is a way to do all of this in the SDN UI now)

@scyto
Copy link
Author

scyto commented Apr 18, 2025

@eriklarsen-bidbax no idea, i don't actually even know where to look at those logs, lol
did you ever get it working?

@scyto
Copy link
Author

scyto commented Apr 18, 2025

@here has anyone managed to get a ceph client to connect to the ceph cluster when just using the IPv6 methodology?

@scyto
Copy link
Author

scyto commented Apr 18, 2025

With this method, IPv4 can now be used stably.

only thing i noticed is that if i do an ifup -a the script doesn't fire (i assume because the interface is up?)

@scyto
Copy link
Author

scyto commented Apr 18, 2025

You have a mistake in the post-up line for MS-01. It is not sleep 5 && post-up ..... as post-up is the marker to trigger the command while loading settings from the /etc/network/interfaces file. To make it work it should be:
post-up sleep 5 && /usr/bin/systemctl restart frr.service

@vlad-lucian86 thanks, nice spot, bad typo on my part, corrected - have you tried @tisayama approach, seems much more elgant and might help

@vlad-lucian86
Copy link

vlad-lucian86 commented Apr 18, 2025

Hi @scyto,

I am glad I could help.
For my setup I stuck with IPv6 for the ring network and removed IPv4 completely.
After a lot of fiddling and problems with ceph versions and ceph dashboard, i got my cluster working on 3 MS-01 a week after my comment above. Thanks to you and your wonderful piece of work.
I made a ceph replicated storage pool with 3x1TB, 1x SSD on each node and used it for some VMs and more importantly for the second ceph erasure coded pool with 2x4TB SSDs on each node. The erasure coded pool can keep data, not metadata so it uses the replicated pool for metadata.
All works great but that is where I stoped. Had a lot of life things to take care off and no time and energy to tinker.
I do need to get new TB cables as the ones I have seem to limit the data transfer at around 2GB/s ... aka ~16Gbps, which is far less then what it should do. But fast enough at first.
Thanks again for the instructions. Will check back when I have more time and post the config files that worked for me. In case we have others that got stuck.

@scyto
Copy link
Author

scyto commented Apr 19, 2025

@vlad-lucian86

thanks for letting me know, i understand taking off time from tinkering for other things - thats why i have been absent for nearly 6 mo :-)

I am also trying some new things this weekend as i have been trying to find a way for ceph clients to connect to the ceph cluster from the LAN and coming up short

as my unifi router has FRR under the covers i will try and get that partipating be expanding the frr on the cluster to the vmbr0 interfaces (had some partial success but seeing weird stuff)

also might take a look at the SDN EXVPN stuff - it feels like we might be able to replace my weird FRR setup we have here with something we can do through the UI - but at this point i understand so very very little

I have new idea on IPv4 reliability (infact reliability overall) based on the reading i was doing last night on the FRR docs.
I am in middle (like as i type) moving the IP configuration out of /etc/networks/interfaces.d/thunderbolt directly into the FRR config itself so it manages assigning the IPs to the interfaces..... seems to be working on mine.....

@here i will probably update the docs this weekend if it works, would be great if someone could try on an MS-01 and see if it makes the IPv4 more reliable or not....

@jacoburgin
Copy link

Thanks for all your work @scyto and every who has contributed.

I have a Nuc 12 setup and UniFi, keep in touch on discord if you want me to test anything with you :)

foR / Jacob

@corvy
Copy link

corvy commented Apr 20, 2025

Hey @scyto and thanks for this gist. I am just wondering why you are not revisiting getting IPv4 to work? I have it running for a few months now. The main problem with IPv4 and IPv6 is that frrr runs before ipv4 is up and functioning. My solution to this was to implement the following:

/etc/network/interfaces.d/thunderbolt (5 node 1, 6 node 2 and .7 node 3 etc. / fc00::85/128 node 1, fc00::86/128 node 2, fc00::87/128 node 3)

iface lo:0 inet static
        address 10.255.255.5/32

auto lo:6
iface lo:6 inet static
        address fc00::85/128

allow-hotplug en05
iface en05 inet manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

/etc/udev/rules.d/10-tb-en.rules

ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-n05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-n06.sh"

pve-n05.sh + pve-06.sh (amend the IF="en0x" part for each script)

#!/bin/bash

LOGFILE="/tmp/udev-debug.log"
VERBOSE="" # Set this to "-v" for verbose logging
IF="en05"

echo "$(date): pve-$IF.sh triggered by udev" >> "$LOGFILE"

# If multiple interfaces go up at the same time, 
# retry 10 times and break the retry when successful
for i in {1..10}; do
    echo "$(date): Attempt $i to bring up $IF" >> "$LOGFILE"
    /usr/sbin/ifup $VERBOSE $IF >> "$LOGFILE" 2>&1 && {
        echo "$(date): Successfully brought up $IF on attempt $i" >> "$LOGFILE"
        break
    }
  
    echo "$(date): Attempt $i failed, retrying in 3 seconds..." >> "$LOGFILE"
    sleep 3
done

For frrr (this is the important bit). Ensures that frrr starts once the en05/en06 network is up. Wants makes sure it just waits, and does not go down if one of the interfaces restarts.

/etc/systemd/system/frr.service.d/dependencies.conf

[Unit]
Wants=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

vtysh (frrr) (rename the following per host

  • net 49.0000.0000.0005.00 /  net 49.0000.0000.0006.00  net 49.0000.0000.0007.00
  • hostname pcX-rv
!

frr version 8.5.2
frr defaults traditional
hostname pxX-rv
log syslog informational
service integrated-vtysh-config
!
interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
exit
!
interface en06
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric csnp-interval 2
 openfabric hello-interval 1
 openfabric hello-multiplier 2
exit
!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0000.0000.0007.00
 lsp-gen-interval 1
 max-lsp-lifetime 600
 lsp-refresh-interval 180
 fabric-tier 0
exit
!
end
!

On top of all this I also set up thunderbolt affinity and and enabled immo. My setup is a 3 node cluster running on Asus NUC 14 Pro. How I found this was that I ran the frrr (vtysh) show running command and I could see that it was missing the ip part, and if I manually added it then ipv4 worked. This works rock-solid for me at least.

@scyto
Copy link
Author

scyto commented Apr 20, 2025

@corvy funny you should ask i did a bunch of testing yesterday and i am moving to this to replace this old gist (this as not 3rd person validation yet but is working on my system with caveat its been up like this for just 24 hours)

https://gist.github.com/scyto/58b5cd9a18e1f5846048aabd4b152564 - you will see one of the things i have done is move the addressing to frr and implemented tisayama approach to restarting frr when the thunderbolt interfaces bounce

The affinity seems to be only needed on some kernels on some devices - its weird, remeber i made this to document my system. That said if someone gives me the latest affinity instructions that can work for all i will attach as a file to the new gist.

Why did you tweak your openfabric intervals - was that to solve some issue?
What passive on interface lo - and would that still be needed if we have moved the ip addresses to frr. confg?

At some point in the coming weeks i plan to see:

  • can i replace this config in somway with SDN
  • how to expose the routes to my lan so other devices can use the ceph file system

@scyto
Copy link
Author

scyto commented Apr 20, 2025

@corvy @tisayama

--edt-- - well i am dumbass i see you had this in your post, i didn't read your post, sorry, glad i arrived at the right solution with copilot, not. glad i missed you addeed this. After this do you still need the script to do that thing if both come up at the same time - do you think its a race condition on the ms101 (i have never seen failures on my nucs). what do you think of my suggestion of puting the bindsto in the ceph.target.wants or (maybe a ceph dependcies location) instead of frr..... seems wiser to me?

--original reply---

so i did some serious 'copilot' investigations of systemd and making service start only when a network is available and up

it suggested adding the following in the frr.service file in the [unit] section, do you think this would work?

BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

or i guess better yet it could be added to ceph.target to ensure ceph never starts unless one of the interfaces is present and up? this all assume copilot didn't hallucinate....

these are the notes it gave me with the suggestions

Wants=network-online.target and After=network-online.target ensure that the service waits for the network to be up.

BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device and After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device ensure that the service is bound to both interfaces and starts if either one is up.

@corvy
Copy link

corvy commented Apr 20, 2025

Yes it is still needed as the interfaces cannot get up at the same time. At least in my setup this fails 1 of 4 times if I do not run the for loop.

@corvy
Copy link

corvy commented Apr 20, 2025

Also; make sure you use wants, not bindsto (see my example) or else things go down if one IF goes down or your restart network service. This will cause crazyness if you restart one node for maintenance for instance.

@scyto
Copy link
Author

scyto commented Apr 20, 2025

@corvy more thoughts

  1. i want the frr service to start even if en05 and en06 are not up because the new SDN functionality of proxmox uses frr - so blocking any other SDN functionality while i wait for en05 and en06 to be up seems like a bad long term approacj
  2. i see there is difference in requires / bindsto and wants - as such i think revising this so ceph doesn't start until frr is up and either en05 and en06 is up - i agree it shouldn't be bindsto as that would stop ceph / frr i think - so i concurr your wants and after is the right approach - i am just thinking it is ceph that should be dependent on that. I think the moving the IPv4 and IPv6 address from interfaces into the frr config might really help here (it works, but as i never really had many failures, really hard for me to test.

let me know what you think, i am going to try and add your wants/after to the ceph service and see what happens....

@scyto
Copy link
Author

scyto commented Apr 20, 2025

@corvy this path /etc/systemd/system/frr.service.d/ does not exist on my proxmox, did you create the frr.service.d sub-directory? can i do the same for ceph is someway? or am i ok continuing to edit the ceph.target file instead?

@corvy
Copy link

corvy commented Apr 20, 2025

Yes I did. I think you can do this for any systemd service.

@scyto
Copy link
Author

scyto commented Apr 20, 2025

@corvy

I am just wondering why you are not revisiting getting IPv4 to work?

well 'life' a) i didn't need ipv4 to work properly as i dropped it 12mo ago as i was using IPv6 and b)brain surgey means i haven't had the energy for any of this for 6mo+ and c)what energy i did have went into testing truenas on a zimacube pro and then deciding to build a rackmount epyc 9115 NAS (still not in production lol)

@corvy
Copy link

corvy commented Apr 20, 2025

On your othe questions I think we just need to do some testing. The important thing is that frrr requires the ip stack to be up before starting the first time. After this it seems more robust to things restarting or changing. No expert here - just my experience.

@scyto
Copy link
Author

scyto commented Apr 20, 2025

Yes I did. I think you can do this for any systemd service.

thanks for helping me understand systemd - something i have avoided to now :-)

well i like that, better than modifying files proxmox may overwite on upgrades..... I assume the dir name will be ceph.service.d - i am a little concerned that might not work as proxmox seems to define each of the ceph sub services specifically....

root@pve1:/etc/systemd/system# find / -name ceph*service
/var/lib/systemd/deb-systemd-helper-enabled/ceph.target.wants/ceph-crash.service
/usr/lib/systemd/system/ceph-crash.service
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]

so there is no 'ceph service' and there multiple named instances of the services running

ceph-crash.service            loaded active running Ceph crash dump collector
[email protected]       loaded active running Ceph metadata server daemon
[email protected]         loaded active running Ceph metadata server daemon
[email protected]         loaded active running Ceph cluster manager daemon
[email protected]    loaded active running Ceph cluster monitor daemon
[email protected]            loaded active running Ceph object storage daemon osd.0
[email protected]            loaded active running Ceph object storage daemon osd.1

so i am stumped on how to create a dependencies.conf file for these that would work easily for different installs for different numbers of services and names....

..sometime later...

ok copilot says i should create this, i gues i could create one file and symlink all these........

/etc/systemd/system/[email protected]/dependencies.conf
/etc/systemd/system/[email protected]/dependencies.conf
/etc/systemd/system/[email protected]/dependencies.conf
/etc/systemd/system/[email protected]/dependencies.conf
/etc/systemd/system/[email protected]/dependencies.conf

..some more time later....

i think copilot helped me yet again, i did the following

# Create directories for drop-in configuration files
sudo mkdir -p /etc/systemd/system/[email protected]
sudo mkdir -p /etc/systemd/system/[email protected]
sudo mkdir -p /etc/systemd/system/[email protected]
sudo mkdir -p /etc/systemd/system/[email protected]
sudo mkdir -p /etc/systemd/system/[email protected]

# Create a single dependencies configuration file with the updated content
echo -e "[Unit]\nWants=frr.service sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device\nAfter=frr.service sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device" | sudo tee /etc/systemd/ceph-dependencies.conf

# Create symlinks for the drop-in configuration files
sudo ln -s /etc/systemd/ceph-dependencies.conf /etc/systemd/system/[email protected]/dependencies.conf
sudo ln -s /etc/systemd/ceph-dependencies.conf /etc/systemd/system/[email protected]/dependencies.conf
sudo ln -s /etc/systemd/ceph-dependencies.conf /etc/systemd/system/[email protected]/dependencies.conf
sudo ln -s /etc/systemd/ceph-dependencies.conf /etc/systemd/system/[email protected]/dependencies.conf
sudo ln -s /etc/systemd/ceph-dependencies.conf /etc/systemd/system/[email protected]/dependencies.conf

then i did systemctl daemon-reload - seems to have worked.

[email protected]
● ├─-.mount
● ├─frr.service
● ├─sys-subsystem-net-devices-en05.device
● ├─sys-subsystem-net-devices-en06.device

i will keep on this one node for a while and see what its like over reboots etc

--edit-- Apr 20yj 6:52pm PDT

ok so this does seem to help (or at least do no harm, still not sure if it is essential)

@scyto
Copy link
Author

scyto commented Apr 20, 2025

On your othe questions I think we just need to do some testing. The important thing is that frrr requires the ip stack to be up before starting the first time. After this it seems more robust to things restarting or changing. No expert here - just my experience.

oh i agree in principle, i just am trying to udnerstand the exact sequencing and why it varies from machine to machine, without knowing that the fix is a little hard... also the interfaces coming up doesn't mean IP is up.....

my issue is i can't test for the failures y'all see - they just don't happen on my machine in general, if I run IPv4 yes eventually it has issues but that happens inconsistently - when i test bouncing 3 nodes in a row IPv4 generally comes up every time....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment