Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active June 2, 2025 10:49
Show Gist options
  • Save scyto/58b5cd9a18e1f5846048aabd4b152564 to your computer and use it in GitHub Desktop.
Save scyto/58b5cd9a18e1f5846048aabd4b152564 to your computer and use it in GitHub Desktop.
New version of my mesh network using openfabric

Enable Dual Stack (IPv4 and IPv6) OpenFabric Routing

Version 2.5 (2025.04.27)

this gist is part of this series

This assumes you are running Proxmox 8.4 and that the line source /etc/network/interfaces.d/* is at the end of the interfaces file (this is automatically added to both new and upgraded installations of Proxmox 8.2).

This changes the previous file design thanks to @NRGNet and @tisayama to make the system much more reliable in general, more maintainable esp for folks using IPv4 on the private cluster network (i still recommend the use of the IPv6 FC00 network you will see in these docs)

Notable changes from original version here

  • move IP address configuration from interfaces.d/thundebolt to frr configuration i reverted this on 2025.04.27 and improved settings in interfaces.d/thunderbolt based on recommendations from chatGPT to solve issues i hit it my routed network setup (coming soon)
  • new approach to remove dependecy on post-up with new scripts in if-up.d that logs to systemlog
  • reminder to copy frr.conf > frr.conf.local to prevent breakage if you enable Proxmox SDN
  • dependent on the changes to the udev link scripts here

This will result in an IPv4 and IPv6 routable mesh network that can survive any one node failure or any one cable failure. Alls the steps in this section must be performed on each node

** NOTES on Dual Stack*

Having spent 3 days hammering my network and playing with various different routed toplogies i am of the current opinion

  • i still prefer IPv6 for my mesh but if you setup for IPv4 it should now be fine but my gists will continue to assume you used IPv6 for ceph
  • i have no opinion on squid and dual stack yet - should be doable... we will seee
  • if you use ONLY IPv6 for the love-of-god(tm) make sure that ms_bind_ipv4 = false is set in ceph.conf or really bad things will eventuall happen

Defining thunderbolt network

This was revised on 2025.04.27 to move loopback IP addressing back from frr.conf to here (along with some reliability changes recommended by chatgpt) having loopback IPs was a stupid idea as they should be up irrespective of the state of the mesh to allow ceph processes to start binding to it.

Create a new file using nano /etc/network/interfaces.d/thunderbolt and populate with the following

# Thunderbolt interfaces for pve1 (Node 81)

auto en05
iface en05 inet6 static
    pre-up ip link set $IFACE up
    mtu 65520

auto en06
iface en06 inet6 static
    pre-up ip link set $IFACE up
    mtu 65520

# Loopback for Ceph MON
auto lo
iface lo inet loopback
    up ip -6 addr add fc00::81/128 dev lo
    up ip addr add 10.0.0.81/32 dev lo

Notes:

  • doing loopback IP is more reliable in interfaces file than in frr.conf the ip address will always be available for the mon, mgr, and mds processes of ceph to bind to irrespective of frr service status
  • mtus are super importantor BGP and openfabric seem to have node to node negotiation issues
  • the pre-up and up directives were recommended by chatGPT to ensure the interfaces are up before applying the IP address and MTU - should make things more reliable

Enable IPv4 and IPv6 forwarding

  1. use nano /etc/sysctl.conf to open the file
  2. uncomment #net.ipv6.conf.all.forwarding=1 (remove the # symbol)
  3. uncomment #net.ipv4.ip_forward=1 (remove the # symbol)
  4. save the file
  5. issue reboot now for a complete reboot

FRR Setup

Install & enable FRR (not needed on proxmox 8.4+ )

  1. Install Free Range Routing (FRR) apt install frr
  2. Enable frr systemctl enable frr

Enable the fabricd daemon

  1. edit the frr daemons file (nano /etc/frr/daemons) to change fabricd=no to fabricd=yes
  2. save the file
  3. restart the service with systemctl restart frr

Mitigate FRR Timing Issues (I need someone with an MS-101 to confirm if helps solve their IPv4 issues)

create script that is automatically processed when en05/en06 are brougt up to restart frr

notes

  • this should make IPv4 more stable for all users (i ended up seeing IPv4 issues too, just less commonly than MS-101 users)
  • i found the chnages i introduced in 2.5 version of this gist make this less needed, occasionally ifreload / ifupdown2 may cause enough changes that frr gets restarted too often and the service will need to be unblocked with systemctl.
  1. create a new file with nano /etc/network/if-up.d/en0x
  2. add to file the following
#!/bin/bash
# note the logger entries log to the system journal in the pve UI etc

INTERFACE=$IFACE

if [ "$INTERFACE" = "en05" ] || [ "$INTERFACE" = "en06" ]; then
    logger "Checking if frr.service is running for $INTERFACE"
    
    if ! systemctl is-active --quiet frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] frr.service not running. Starting service."
        if systemctl start frr.service; then
            logger -t SCYTO "   [SCYTO SCRIPT ] Successfully started frr.service"
        else
            logger -t SCYTO "   [SCYTO SCRIPT ] Failed to start frr.service"
        fi
        exit 0
    fi

    logger "Attempting to reload frr.service for $INTERFACE"
    if systemctl reload frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] Successfully reloaded frr.service for $INTERFACE"
    else
        logger -t SCYTO "   [SCYTO SCRIPT ] Failed to reload frr.service for $INTERFACE"
    fi
fi
  1. make it executable with chmod +x /etc/network/if-up.d/en0x

mitgigate issues cause by things that reset the loopback

create script that is automatically processed when lo is reprocessed by ifreload, ifupdown2, pve set, etc

  1. create a new file with nano /etc/network/if-up.d/lo
  2. add to file the following
#!/bin/bash

INTERFACE=$IFACE

if [ "$INTERFACE" = "lo" ]  ; then
    logger "Attempting to restart frr.service for $INTERFACE"
    if systemctl restart frr.service; then
        logger -t SCYTO "   [SCYTO SCRIPT ] Successfully restart frr.service for $INTERFACE"
    else
        logger -t SCYTO "   [SCYTO SCRIPT ] Failed to restart frr.service for $INTERFACE"
    fi
fi

make it executable with chmod +x /etc/network/if-up.d/lo

Configure OpenFabric (perforn on all nodes)

**note: if (and only if) you have already configured SDN you should make these settings in /etc/frr/frr.conf.local and reapply your SDN configuration to have SDN propogate these into frr.conf (you can also make the edits to both files if you prefer) if you make these edits to only frr.conf with SDN active and then reapply the settings it will loose these settings.

  1. enter the FRR shell with vtysh
  2. optionally show the current config with show running-config
  3. enter the configure mode with configure
  4. Apply the bellow configuration (it is possible to cut and paste this into the shell instead of typing it manually, you may need to press return to set the last !. Also check there were no errors in repsonse to the paste text.).

Note: the X should be the number of the node you are working on For example node 1 would use 1 in place of X

ip forwarding
ipv6 forwarding

interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric hello-interval 1
 openfabric hello-multiplier 3
 openfabric csnp-interval 5
 openfabric psnp-interval 2
exit

interface en06
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric hello-interval 1
 openfabric hello-multiplier 3
 openfabric csnp-interval 5
 openfabric psnp-interval 2
exit

interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric hello-interval 1
 openfabric hello-multiplier 3
 openfabric csnp-interval 5
 openfabric psnp-interval 2
 openfabric passive
exit

router openfabric 1
net 49.0000.0000.000x.00
lsp-gen-interval 5
exit
!
exit

  1. you may need to press return after the last exit to get to a new line - if so do this
  2. save the configu with write memory
  3. show the configure applied correctly with show running-config - note the order of the items will be different to how you entered them and thats ok. (If you made a mistake i found the easiest way was to edt /etc/frr/frr.conf - but be careful if you do that.)
  4. use the command exit to leave setup
  5. repeat steps 1 to 9 on the other 3 nodes
  6. once you have configured all 3 nodes issue the command vtysh -c "show openfabric topology" if you did everything right you will see (note it may take 45 seconds for for all routes to show if you just restarted frr for any reason):
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
pve1                                                                  
10.0.0.81/32         IP internal  0                                     pve1(4)
pve2                 TE-IS        10     pve2                 en06      pve1(4)
pve3                 TE-IS        10     pve3                 en05      pve1(4)
10.0.0.82/32         IP TE        20     pve2                 en06      pve2(4)
10.0.0.83/32         IP TE        20     pve3                 en05      pve3(4)

IS-IS paths to level-2 routers that speak IPv6
Vertex               Type         Metric Next-Hop             Interface Parent
pve1                                                                  
fc00::81/128         IP6 internal 0                                     pve1(4)
pve2                 TE-IS        10     pve2                 en06      pve1(4)
pve3                 TE-IS        10     pve3                 en05      pve1(4)
fc00::82/128         IP6 internal 20     pve2                 en06      pve2(4)
fc00::83/128         IP6 internal 20     pve3                 en05      pve3(4)

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent

Now you should be in a place to ping each node from evey node across the thunderbolt mesh using IPv4 or IPv6 as you see fit.

IMPORTAT - you need to do this to stop SDN breaking you in future

if all is working issue a cp /etc/frr/frr.conf /etc/frr/frr.conf.local this is because when enabling proxmox SDN proxmox will overwrite frr.conf - however it will read the .local file and apply that.

**note: if you already have SDN configured do not do the step above as you will mess both your SDN and this openfabric topology (see note at start of frr instructions)

based on this response https://forum.proxmox.com/threads/relationship-of-frr-conf-and-frr-conf-local.165465/ if you have SDN all local (non SDN) configuration changes should be made in .local, this should be read next time SDN apply is used. do not copy frr.conf > frr.conf.local after doing anything with SDN or when you tear down SDN the settings will not be removed from frr.conf

@scyto
Copy link
Author

scyto commented May 2, 2025

I also made a script that wraps FIO to make benchmarking easier and prevent me trashing a disk with FIO accidentally :-)

https://github.com/scyto/fio-test-script

let me know if its something intersting / i should keep working on.

@scyto
Copy link
Author

scyto commented May 2, 2025

and lastly this is my first draft of how to mount ceph across the network into a VM with routed network, or any device on LAN if you have implemented that gist

https://gist.github.com/scyto/61b38c47cb2c79db279ee1cbb6f31772

personally based on benchmarks i will stay with virtioFS, i guess i should write up my approach / need hookscripts to make sure VM only starts if the cephFS mount is there

@zejar
Copy link

zejar commented May 5, 2025

Great write-up! I do have a question about the experience of others regarding MTU size between the nodes.
From my experience utilising an MTU of 65520 results in rather unstable iperf3 performance (haven't gotten to setting up Ceph yet) over thunderbolt between the nodes (3x MS-01). Even an MTU of 9000 isn't as stable as an MTU of 1500.

IOMMU has been enabled in the Grub config and the thunderbolt affinity script has been used to select the P-cores for the processing of traffic over the thunderbolt interfaces.

Below some iperf3 results (IPv4 and IPv6 are similar, using point-to-point addresses instead of loopback addresses to rule out as much as possible):

MTU 1500 (upload & download)
Connecting to host fd00::2, port 5201
[  5] local fd00::1 port 48008 connected to fd00::2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.37 GBytes  20.3 Gbits/sec  305    945 KBytes
[  5]   1.00-2.00   sec  2.46 GBytes  21.1 Gbits/sec  728    883 KBytes
[  5]   2.00-3.00   sec  2.47 GBytes  21.2 Gbits/sec  315   1.15 MBytes
[  5]   3.00-4.00   sec  2.45 GBytes  21.0 Gbits/sec  495   1.10 MBytes
[  5]   4.00-5.00   sec  2.50 GBytes  21.5 Gbits/sec  364   1.14 MBytes
[  5]   5.00-6.00   sec  2.42 GBytes  20.8 Gbits/sec  495    866 KBytes
[  5]   6.00-7.00   sec  2.44 GBytes  21.0 Gbits/sec  360   1.09 MBytes
[  5]   7.00-8.00   sec  2.43 GBytes  20.9 Gbits/sec  495    890 KBytes
[  5]   8.00-9.00   sec  2.45 GBytes  21.0 Gbits/sec  405   1.07 MBytes
[  5]   9.00-10.00  sec  2.44 GBytes  21.0 Gbits/sec  405   1.23 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  24.4 GBytes  21.0 Gbits/sec  4367             sender
[  5]   0.00-10.00  sec  24.4 GBytes  21.0 Gbits/sec                  receiver

iperf Done.


Connecting to host fd00::2, port 5201
Reverse mode, remote host fd00::2 is sending
[  5] local fd00::1 port 49726 connected to fd00::2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  2.89 GBytes  24.8 Gbits/sec
[  5]   1.00-2.00   sec  2.83 GBytes  24.3 Gbits/sec
[  5]   2.00-3.00   sec  2.73 GBytes  23.5 Gbits/sec
[  5]   3.00-4.00   sec  2.76 GBytes  23.8 Gbits/sec
[  5]   4.00-5.00   sec  2.80 GBytes  24.0 Gbits/sec
[  5]   5.00-6.00   sec  2.77 GBytes  23.8 Gbits/sec
[  5]   6.00-7.00   sec  2.73 GBytes  23.4 Gbits/sec
[  5]   7.00-8.00   sec  2.75 GBytes  23.6 Gbits/sec
[  5]   8.00-9.00   sec  2.70 GBytes  23.2 Gbits/sec
[  5]   9.00-10.00  sec  2.72 GBytes  23.4 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  27.7 GBytes  23.8 Gbits/sec  5370             sender
[  5]   0.00-10.00  sec  27.7 GBytes  23.8 Gbits/sec                  receiver

iperf Done.
MTU 9000 (upload & download)
Connecting to host fd00::2, port 5201
[  5] local fd00::1 port 52748 connected to fd00::2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.45 GBytes  21.0 Gbits/sec  3577   1003 KBytes
[  5]   1.00-2.00   sec  1.90 GBytes  16.3 Gbits/sec  2321   1.12 MBytes
[  5]   2.00-3.00   sec  1.43 GBytes  12.3 Gbits/sec  1700    968 KBytes
[  5]   3.00-4.00   sec  1.88 GBytes  16.1 Gbits/sec  2575   1.01 MBytes
[  5]   4.00-5.00   sec  2.36 GBytes  20.3 Gbits/sec  3282   1.04 MBytes
[  5]   5.00-6.00   sec  2.34 GBytes  20.1 Gbits/sec  3125    994 KBytes
[  5]   6.00-7.00   sec  2.31 GBytes  19.9 Gbits/sec  2463   1.16 MBytes
[  5]   7.00-8.00   sec  2.36 GBytes  20.3 Gbits/sec  3084   1020 KBytes
[  5]   8.00-9.00   sec  2.27 GBytes  19.5 Gbits/sec  2386    619 KBytes
[  5]   9.00-10.00  sec  1.89 GBytes  16.2 Gbits/sec  2545    872 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  21.2 GBytes  18.2 Gbits/sec  27058             sender
[  5]   0.00-10.00  sec  21.2 GBytes  18.2 Gbits/sec                  receiver

iperf Done.


Connecting to host fd00::2, port 5201
Reverse mode, remote host fd00::2 is sending
[  5] local fd00::1 port 38058 connected to fd00::2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  2.19 GBytes  18.8 Gbits/sec
[  5]   1.00-2.00   sec  2.70 GBytes  23.2 Gbits/sec
[  5]   2.00-3.00   sec  2.65 GBytes  22.8 Gbits/sec
[  5]   3.00-4.00   sec  2.15 GBytes  18.5 Gbits/sec
[  5]   4.00-5.00   sec  2.13 GBytes  18.3 Gbits/sec
[  5]   5.00-6.00   sec  2.09 GBytes  18.0 Gbits/sec
[  5]   6.00-7.00   sec  2.14 GBytes  18.4 Gbits/sec
[  5]   7.00-8.00   sec  1.56 GBytes  13.4 Gbits/sec
[  5]   8.00-9.00   sec  2.64 GBytes  22.7 Gbits/sec
[  5]   9.00-10.00  sec  2.67 GBytes  22.9 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  22.9 GBytes  19.7 Gbits/sec  23803             sender
[  5]   0.00-10.00  sec  22.9 GBytes  19.7 Gbits/sec                  receiver

iperf Done.
MTU 65520 (upload & download)
Connecting to host fd00::2, port 5201
[  5] local fd00::1 port 35406 connected to fd00::2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.58 GBytes  13.5 Gbits/sec  735   63.9 KBytes
[  5]   1.00-2.00   sec  1.75 GBytes  15.0 Gbits/sec  837   1.50 MBytes
[  5]   2.00-3.00   sec  2.28 GBytes  19.6 Gbits/sec  955   1.19 MBytes
[  5]   3.00-4.00   sec  1.90 GBytes  16.3 Gbits/sec  784   2.18 MBytes
[  5]   4.00-5.00   sec  2.21 GBytes  19.0 Gbits/sec  839    831 KBytes
[  5]   5.00-6.00   sec  1.51 GBytes  13.0 Gbits/sec  680   2.18 MBytes
[  5]   6.00-7.00   sec  2.20 GBytes  18.9 Gbits/sec  920   2.37 MBytes
[  5]   7.00-8.00   sec  2.22 GBytes  19.1 Gbits/sec  897   1.19 MBytes
[  5]   8.00-9.00   sec   909 MBytes  7.62 Gbits/sec  403   2.75 MBytes
[  5]   9.00-10.00  sec  1.94 GBytes  16.7 Gbits/sec  1048   1.12 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  18.5 GBytes  15.9 Gbits/sec  8098             sender
[  5]   0.00-10.00  sec  18.5 GBytes  15.9 Gbits/sec                  receiver

iperf Done.

Connecting to host fd00::2, port 5201
Reverse mode, remote host fd00::2 is sending
[  5] local fd00::1 port 50834 connected to fd00::2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  2.70 GBytes  23.2 Gbits/sec
[  5]   1.00-2.00   sec  2.05 GBytes  17.6 Gbits/sec
[  5]   2.00-3.00   sec  2.57 GBytes  22.1 Gbits/sec
[  5]   3.00-4.00   sec  2.06 GBytes  17.7 Gbits/sec
[  5]   4.00-5.00   sec  2.12 GBytes  18.2 Gbits/sec
[  5]   5.00-6.00   sec  2.16 GBytes  18.6 Gbits/sec
[  5]   6.00-7.00   sec  2.51 GBytes  21.6 Gbits/sec
[  5]   7.00-8.00   sec  2.11 GBytes  18.1 Gbits/sec
[  5]   8.00-9.00   sec  2.13 GBytes  18.3 Gbits/sec
[  5]   9.00-10.00  sec  1.59 GBytes  13.6 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  22.0 GBytes  18.9 Gbits/sec  8358             sender
[  5]   0.00-10.00  sec  22.0 GBytes  18.9 Gbits/sec                  receiver

iperf Done.

What is the experience (and results) of others?

@DarkPhyber-hg
Copy link

@zejar which cpu do you have on your MS-01s? I've got 3x 13900h and i've been trying to get my retries down to near zero. I've made a lot of progress, but my initial results were nowhere near that bad, and mine are really only bad in bi-directional tests.

@zejar
Copy link

zejar commented May 19, 2025

@DarkPhyber-hg Hmm that is strange, I also have the i9-13900H in my three MS-01's.
Which microcode are you using on your MS-01's? I am running "microcode : 0x4124" (grep microcode /proc/cpuinfo | uniq) and BIOS version 1.26.
Also, which Thunderbolt cables are you using? I'm using the Cable Matters TB4 cables (80cm).

@DarkPhyber-hg
Copy link

DarkPhyber-hg commented May 19, 2025

@zejar to answer your questions:
on all 3 nodes i'm running microcode : 0x4124.
I'm running firmware 1.27 on all 3 nodes.
I'm currently running the opt-in kernel 6.14.0-2-pve
I'm using 30cm OWC cables, they're a little tight but i wanted to have as short cables as possible.

you can see everything i did on another page in this gist. I spammed like 5 posts in a row. https://gist.github.com/scyto/67fdc9a517faefa68f730f82d7fa3570?permalink_comment_id=5579176#gistcomment-5579176

If i have time before i leave for vacation i'm going to turn off the traffic shaping and try different MTU sizes.

@Randymartin1991
Copy link

Randymartin1991 commented May 28, 2025

I got everyting working and the ping as well, however I do not see the loopback interfaces in the GUI therfore I cannot use it as a Ceph cluster network or do anything with it, I am doing only an ipv4 version but this should not be an issue.

I run proxmox 8.4 and I have created the new thunderbolt file: /etc/network/interfaces.d/thunderbolt
With the content:
auto en05
iface en05 inet static
pre-up ip link set $IFACE up
mtu 65520

auto en06
iface en06 inet static
pre-up ip link set $IFACE up
mtu 65520

#Loopback for Ceph MON
auto lo
iface lo inet loopback
up ip addr add 10.10.10.1/32 dev lo

I do have the interface en05 and en06 in the gui but not the lo.
Here the fabric:

IS-IS paths to level-2 routers that speak IP
Vertex Type Metric Next-Hop Interface Parent

node1
10.10.10.1/32 IP internal 0 node1(4)
node2 TE-IS 10 node2 en05 node1(4)
node3 TE-IS 10 node3 en06 node1(4)
10.10.10.2/32 IP TE 20 node2 en05 node2(4)
10.10.10.3/32 IP TE 20 node3 en06 node3(4)

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex Type Metric Next-Hop Interface Parent

What am i Missing?

@silverjerk
Copy link

silverjerk commented Jun 1, 2025

Edit: I'm a buffoon and missed a critical step. Remember kids, RTFM. If anyone gets to this point due to user error, do not miss the point in the process where you need to manually update the datacenter.cfg file with the proper settings. Migrations now work between all 3 nodes. See below for context.

Also running 3x MS-01s in a cluster (PVE-01, PVE-02, PVE-03)

Followed the revised guide, but seemingly hit a wall and went back and forth between the new and deprecated guide.

After setup, I can successfully migrate VMs from nodes 2 (10.0.0.82) and 3 (10.0.0.83) to node 1 (10.0.0.81).

Topology looks similar to the one represented in the gist. I can ping each one of the IPs above from the adjacent machines.

However, I cannot migrate from any other nodes to nodes 2 and 3, with some error similar to the below.

could not get migration ip: no IP address configured on local node for network 'fc00::81/128'
TASK ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve-03' -o 'UserKnownHostsFile=/etc/pve/nodes/pve-03/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' [email protected] pvecm mtunnel -migration_network fc00::81/128 -get_migration_ip' failed: exit code 255

Have gone through the entire process again, with the same result. Something tells me there is something simple amiss in the setup process. I've double checked all ipv4/ipv6 syntax, reboot the machines as necessary.

Secondarily, I see that lo:0 and lo:6 are set to autostart=yes, but both are set to active=no in the UI on all three nodes.

Last detail, when in datacenter, I can change the migration network and it is always directly linked to the node within which I'm creating the setting (in node 3, I see 10.0.0.83), and then when setting it to this new setting, I can correctly migrate to node 3, and then if I do the same in node 2, again, I can migrate to node 2. It's as if it's only allowing migrations to the IP represented in the migration settings, and not the entire ring network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment