this gist is part of this series
This assumes you are running Proxmox 8.4 and that the line source /etc/network/interfaces.d/*
is at the end of the interfaces file (this is automatically added to both new and upgraded installations of Proxmox 8.2).
This changes the previous file design thanks to @NRGNet and @tisayama to make the system much more reliable in general, more maintainable esp for folks using IPv4 on the private cluster network (i still recommend the use of the IPv6 FC00 network you will see in these docs)
Notable changes from original version here
- move IP address configuration from
interfaces.d/thundebolt
to frr configuration - new approach to remove dependecy on post-up, new script in if-up.d that logs to systemlog
- reminder to copy frr.conf > frr.conf.local to prevent breakage if you enable Proxmox SDN
- dependent on the changes to the udev link scripts here
This will result in an IPv4 and IPv6 routable mesh network that can survive any one node failure or any one cable failure. Alls the steps in this section must be performed on each node
I have included this for completeness, i only run the FC00:: IPv6 network as ceph does not support dual stack, i strongly recommend you consider only using IPv6. For example for ceph do not dual stack - either use IPv4 or IPv6 addressees for all the monitors, MDS and daemons - despite the docs implying it is ok my findings on quincy are is it is funky....
With all the scripts and changes folks have contributed IPv4 should now be stable. I am recommending new folks use IPv4 for ceph as documented in that gists in the series. This is to avoid ongoing issues with SDN and IPv6. I have yet to decide if i will migrate my ceph back to IPv4 so i can play with SDN or just wait for the SDN issues to be solved.
Create a new file using nano /etc/network/interfaces.d/thunderbolt
and populate with the following
There should no lober be any IP addresses in this file for lo and lo:6
allow-hotplug en05
iface en05 inet manual
mtu 65520
allow-hotplug en06
iface en06 inet manual
mtu 65520
Save file, repeat on each node.
- use
nano /etc/sysctl.conf
to open the file - uncomment
#net.ipv6.conf.all.forwarding=1
(remove the # symbol) - uncomment
#net.ipv4.ip_forward=1
(remove the # symbol) - save the file
- issue
reboot now
for a complete reboot
Install Free Range Routing (FRR) apt install frr
Enable frr systemctl enable frr
- edit the frr daemons file (
nano /etc/frr/daemons
) to changefabricd=no
tofabricd=yes
- save the file
- restart the service with
systemctl restart frr
Mitigate FRR Timing Issues (I need someone with an MS-101 to confirm if helps solve their IPv4 issues)
this should make IPv4 more stable for all users (i ended up seeing IPv4 issues too, just less commonly than MS-101 users)
- create a new file with
nano /etc/network/if-up.d/en0x
- add to file the following
#!/bin/bash
# note the logger entries log to the system journal in the pve UI etc
INTERFACE=$IFACE
if [ "$INTERFACE" = "en05" ] || [ "$INTERFACE" = "en06" ]; then
logger "Checking if frr.service is running for $INTERFACE"
if ! systemctl is-active --quiet frr.service; then
logger -t SCYTO " [SCYTO SCRIPT ] frr.service not running. Starting service."
if systemctl start frr.service; then
logger -t SCYTO " [SCYTO SCRIPT ] Successfully started frr.service"
else
logger -t SCYTO " [SCYTO SCRIPT ] Failed to start frr.service"
fi
exit 0
fi
logger "Attempting to reload frr.service for $INTERFACE"
if systemctl reload frr.service; then
logger -t SCYTO " [SCYTO SCRIPT ] Successfully reloaded frr.service for $INTERFACE"
else
logger -t SCYTO " [SCYTO SCRIPT ] Failed to reload frr.service for $INTERFACE"
fi
fi
- make it executable with
chmod +x /etc/network/if-up.d/en0x
create script that is automatically processed when lo is reprocessed by ifreload, ifupdown2, pve set, etc
- create a new file with
nano /etc/network/if-up.d/lo
- add to file the following
#!/bin/bash
INTERFACE=$IFACE
if [ "$INTERFACE" = "lo" ] ; then
logger "Attempting to restart frr.service for $INTERFACE"
if systemctl restart frr.service; then
logger -t SCYTO " [SCYTO SCRIPT ] Successfully restart frr.service for $INTERFACE"
else
logger -t SCYTO " [SCYTO SCRIPT ] Failed to restart frr.service for $INTERFACE"
fi
fi
make it executable with chmod +x /etc/network/if-up.d/lo
**note: if (and only if) you have already configured SDN you should make these settings in /etc/frr/frr.conf.local and reapply your SDN configuration to have SDN propogate these into frr.conf (you can also make the edits to both files if you prefer) if you make these edits to only frr.conf with SDN active and then reapply the settings it will loose these settings.
- enter the FRR shell with
vtysh
- optionally show the current config with
show running-config
- enter the configure mode with
configure
- Apply the bellow configuration (it is possible to cut and paste this into the shell instead of typing it manually, you may need to press return to set the last !. Also check there were no errors in repsonse to the paste text.).
Note: the X should be the number of the node you are working on For example node 1 would use 1 in place of X
ip forwarding
ipv6 forwarding
!
interface en05
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface en06
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface lo
ip address 10.0.0.8x/32
ipv6 address fc00::8x/128
ip router openfabric 1
ipv6 router openfabric 1
openfabric passive
exit
!
router openfabric 1
net 49.0000.0000.000x.00
exit
!
exit
- you may need to press return after the last
exit
to get to a new line - if so do this - save the configu with
write memory
- show the configure applied correctly with
show running-config
- note the order of the items will be different to how you entered them and thats ok. (If you made a mistake i found the easiest way was to edt/etc/frr/frr.conf
- but be careful if you do that.) - use the command
exit
to leave setup - repeat steps 1 to 9 on the other 3 nodes
- once you have configured all 3 nodes issue the command
vtysh -c "show openfabric topology"
if you did everything right you will see (note it may take 45 seconds for for all routes to show if you just restarted frr for any reason):
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex Type Metric Next-Hop Interface Parent
pve1
10.0.0.81/32 IP internal 0 pve1(4)
pve2 TE-IS 10 pve2 en06 pve1(4)
pve3 TE-IS 10 pve3 en05 pve1(4)
10.0.0.82/32 IP TE 20 pve2 en06 pve2(4)
10.0.0.83/32 IP TE 20 pve3 en05 pve3(4)
IS-IS paths to level-2 routers that speak IPv6
Vertex Type Metric Next-Hop Interface Parent
pve1
fc00::81/128 IP6 internal 0 pve1(4)
pve2 TE-IS 10 pve2 en06 pve1(4)
pve3 TE-IS 10 pve3 en05 pve1(4)
fc00::82/128 IP6 internal 20 pve2 en06 pve2(4)
fc00::83/128 IP6 internal 20 pve3 en05 pve3(4)
IS-IS paths to level-2 routers with hop-by-hop metric
Vertex Type Metric Next-Hop Interface Parent
Now you should be in a place to ping each node from evey node across the thunderbolt mesh using IPv4 or IPv6 as you see fit.
if all is working issue a cp /etc/frr/frr.conf /etc/frr/frr.conf.local
this is because when enabling proxmox SDN proxmox will overwrite frr.conf - however it will read the .local file and apply that.
**note: if you already have SDN configured do not do the step above as you will mess both your SDN and this openfabric topology (see note at start of frr instructions)
based on this response https://forum.proxmox.com/threads/relationship-of-frr-conf-and-frr-conf-local.165465/ if you have SDN all local (non SDN) configuration changes should be made in .local, this should be read next time SDN apply is used. do not copy frr.conf > frr.conf.local after doing anything with SDN or when you tear down SDN the settings will not be removed from frr.conf
I am currently only using IPv4 as it seems to work stable for me. But I am also not using Ceph at the moment - I mainly use the Thunderbolt ring for HA migration and ZFS replication. Next step for me is allowing the VMs to access this network, then I'll move onto Ceph.
Yeah, that's also where I'm bashing my head against a wall right now.
Maybe it helps with troubleshooting, I found a "solution" (more a hack) to at least automatically fix the frr.conf file after applying SDN settings. I got tired of having to fix it manually every time for every node and this way I only lose a maximum of 10 pings and don't have to touch anything. Ideally this won't be needed once I have a working setup, but while testing and constantly adjusting settings, this really helps me.
Basically this addresses what is written in section "2.6 - Fixing up FRR config".
It rewrites the bgp router entries which get set to the management address instead of the lo address and it also removes the nodes own lo address from the neighbor list (not sure if that causes any issues, just following the forum tutorial here). Afterwards it restarts the frr service. This is working fine for me, but may need tweaking for other setups, especially the hostname and ip handling, but those could just be hardcoded for each node instead as well.
The script:
The config watcher:
[Unit] Description=Watch frr.conf for changes [Path] PathChanged=/etc/frr/frr.conf [Install] WantedBy=multi-user.target
The service:
Enable and start it:
systemctl daemon-reexec systemctl daemon-reload systemctl enable --now fix-frr.path
Ah I see, didn't run into that issue myself yet. Then maybe this would be a safer option:
Edit:
I also noticed in the "/etc/network/interfaces.d/sdn" file the ip addresses for "vxlan-local-tunnelip" in the "vrfvx_evpnPRD" and "vxlan_vxnet1" sections are (seemingly) randomly switching after every sdn apply between my node management ip addresses and lo addresses. Not sure why that happens or which one would be correct for our intended use case.