Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active February 22, 2024 18:30
Show Gist options
  • Save scyto/629c61d36af07b5ee45adfb172e25384 to your computer and use it in GitHub Desktop.
Save scyto/629c61d36af07b5ee45adfb172e25384 to your computer and use it in GitHub Desktop.
IPv4 ospf mesh network for ceph

Enable OSPF Routing on Thunderbolt Mesh

This has been deprectaed

It is now superceded by Openfabric Routing see here

continue at your own peril, for reference only now.

Old Gist

This will result in an IPv4 routable mesh network that can survive any one node failure or any one cable failure. All the steps in this section must be performed on each node

Please note the main section of this gist describes IPv4 on mesh. Lower down you will find additonal files that cpatures:

  1. differences if you want dual stack IPv4 / IPv6 routing (this is now what i run since writing the original gist)
  2. opernfabric instead of OSPF (to do)

this gist is part of this series

Key Parameters

Key Information Used Note i used the 10.x IPv4 space as this is not used anywhere else on my network YMMV

lo = loopback en05/06 - these are the thunderbolt ports

Node l:

  • lo:0 = 10.0.0.81/32
  • en05 = 10.0.0.5/30
  • en06 = 10.0.0.9/30
  • ospf router-id = 0.0.0.1

Node 2:

  • lo:0 = 10.0.0.82/32
  • en05 = 10.0.0.10/30
  • en06 = 10.0.0.13/30
  • ospf router-id = 0.0.0.2

Node 3:

  • lo:0 = 10.0.0.83/32
  • en05 = 10.0.0.14/30
  • en06 = 10.0.0.6/30
  • ospf router-id = 0.0.0.3

Enable IPv4 forwarding

Using IPv4 to take advantage of not needing to use addresses - does make things simpler

  • uncomment #net.ipv4.ip_forward=1 using nano /etc/sysctl.conf (remove the # symbol and save the file)

Create Loopback interface

doing this means we don't have to give each thunderbolt a manual IPv6 addrees and that these addresses stay constant no matter what Add the following to each node using nano /etc/network/interfaces

This should go uder the auto lo section and for each node the X should be 1, 2 or depending on the node

auto lo:0
iface lo:0 inet static
        address 10.0.0.8X/32

so on the first node it would look comething like this:

...
auto lo
iface lo inet loopback
 
auto lo:0
iface lo:0 inet static
        address 10.0.0.81/32
...

Save file.

Assign IP address to en05 and en06 using the GUI

  1. use the table further up and assign addresses
  2. after appliying both addresss remeber to hit apply configuration button

Install OSPF (perform on all nodes)

  1. Install Free Range Routing (FRR) apt install frr
  2. Edit the FRR config file: nano /etc/frr/daemons
  3. Adjust ospfd=no to ospfd=yes
  4. save the file
  5. restart the service with systemctl restart frr

Configure OSPF (perforn on all nodes)

  1. enter the FRR shell with vtysh
  2. optionally show the current config with show running-config
  3. enter the configure mode with configure
  4. Apply the bellow configuration (it is possible to cut and paste this into the shell instead of typing it manually, you may need to press return to set the last !. Also check there were no errors in repsonse to the paste text.). Note: the X should be the number of the node you are working on, so for my stetup this would 0.0.0.1, 0.0.0.2 or 0.0.0.3.
ip forwarding
!
router ospf
 ospf router-id 0.0.0.X
 log-adjacency-changes
 exit
!
interface lo
 ip ospf area 0
 exit
!
interface en05
 ip ospf area 0
 ip ospf network broadcast
 exit
!
interface en06
 ip ospf area 0
 ip ospf network broadcast
 exit
!

  1. you may need to pres return after the last ! to get to a new line - if so do this
  2. exit the configure mode with the command end
  3. save the configu with write memory
  4. show the configure applied correctly with show running-config - note the order of the items will be different to how you entered them and thats ok. (If you made a mistake i found the easiest way was to edt /etc/frr/frr.conf - but be careful if you do that.)
  5. use the command exit to leave setup
  6. repeat steps 1 to 9 on the other 3 nodes
  7. once you have configured all 3 nodes issue the command vtysh -c "show ip ospf neighbor" you will see:
root@pve1:~# vtysh -c "show ip ospf neighbor"

Neighbor ID     Pri State           Up Time         Dead Time Address         Interface                        RXmtL RqstL DBsmL
0.0.0.2           1 Full/DROther    52m26s            33.951s 10.0.0.10       en06:10.0.0.9                        0     0     0
0.0.0.3           1 Full/DROther    51m56s            33.444s 10.0.0.6        en05:10.0.0.5                        0     0     0

  1. now issue the command vtysh -c "show ip route" and you will see:
root@pve1:~# vtysh -c "show ip route"
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

C>* 10.0.0.4/30 is directly connected, en05, 00:53:16
O>* 10.0.0.5/32 [110/0] is directly connected, en05, weight 1, 00:53:16
O   10.0.0.6/32 [110/10] via 10.0.0.6, en05 inactive, weight 1, 00:53:11
C>* 10.0.0.8/30 is directly connected, en06, 00:53:46
O>* 10.0.0.9/32 [110/0] is directly connected, en06, weight 1, 00:53:46
O   10.0.0.10/32 [110/10] via 10.0.0.10, en06 inactive, weight 1, 00:53:41
O>* 10.0.0.13/32 [110/10] via 10.0.0.10, en06, weight 1, 00:53:32
O>* 10.0.0.14/32 [110/10] via 10.0.0.6, en05, weight 1, 00:53:11
O   10.0.0.81/32 [110/0] is directly connected, lo, weight 1, 12:15:09
C>* 10.0.0.81/32 is directly connected, lo, 12:15:09
O>* 10.0.0.82/32 [110/10] via 10.0.0.10, en06, weight 1, 00:53:41
O>* 10.0.0.83/32 [110/10] via 10.0.0.6, en05, weight 1, 00:53:11
C>* 192.168.1.0/24 is directly connected, vmbr0, 12:15:06

and lastly ip route

root@pve1:~# ip route
default via 192.168.1.1 dev vmbr0 proto kernel onlink 
10.0.0.4/30 dev en05 proto kernel scope link src 10.0.0.5 
10.0.0.8/30 dev en06 proto kernel scope link src 10.0.0.9 
10.0.0.12/30 nhid 53 proto ospf metric 20 
        nexthop via 10.0.0.6 dev en05 weight 1 
        nexthop via 10.0.0.10 dev en06 weight 1 
10.0.0.82 nhid 54 via 10.0.0.10 dev en06 proto ospf metric 20 
10.0.0.83 nhid 33 via 10.0.0.6 dev en05 proto ospf metric 20 
192.168.1.0/24 dev vmbr0 proto kernel scope link src 192.168.1.81 

##Testing Example You can now test the network by pinging the IPv4 loopback addresses of the other nodes. For example ping (using my IPs defined earlier):

  • ping 10.0.0.81
  • ping 10.0.0.82
  • ping 10.0.0.83

Now pull one of the TB cables and repeat the test.

You should still be able to ping all nodes!!

This supplement is if you want dual stack IPv4 abd IPv6

Note id you are doing CEPH it should be either IPv4 or IPv6 for all the monitors, MDS and daemons - do not try and dual stack it, depite the docs implying it is ok my findsing on quincy are is it is funky.... so stick yp IPv4 or IPv6 - it is possible to switch ceph back and forth - but be very careful.... it will be scary (tl;dr pick one)

Create an IPv6 loopback

In /etc/network/intefaces you will want an IPv6 loopback for IPv6 seperate from IPv4. My best pactice is to use the same number in the last hextet as the last octet from IPv4 - makes things easy to remember)

so PVE1 would look like this, increment the last digit of the IP for each subsequent node.

...
auto lo:6
iface lo:6 inet static
        address fc00::81/128

...

Enable IPv4 and IPv6 forwarding

  1. use nano /etc/sysctl.conf to open the file
  2. uncomment #net.ipv6.conf.all.forwarding=1 (remove the # symbol)
  3. uncomment #net.ipv4.ip_forward=1 (remove the # symbol)
  4. save the file

FRR Setup

This is the content for FRR - rememvber to increment the router-ids on each node you use this on where you see X

edit the frr daemons file to change ospf6d=no to ospf6d=yes and ospfd=no to ospfd=yes

This is the config to issue in vtysh - note if you are moving from a pure IPv4 configu youy might want to stop the serice and delete the frr config file before doing this to reset the config.

ip forwarding
ipv6 forwarding
!
router ospf
 ospf router-id 0.0.0.X
 log-adjacency-changes
 exit
!
router ospf6
 ospf6 router-id 0.0.0.1
 log-adjacency-changes
 timers throttle spf 100 200 5000
 exit
!
interface lo
 ip ospf area 0
 ipv6 ospf6 area 0
 exit
!
interface en05
 ip ospf area 0
 ip ospf network broadcast
 ipv6 ospf6 area 0
 ipv6 ospf6 network broadcast
 exit
!
interface en06
 ip ospf area 0
 ip ospf network broadcast
 ipv6 ospf6 area 0
 ipv6 ospf6 network broadcast
 exit
!

to do: speed up faiover by plaing with deadtime, hello time etc.

@MeshedAlmond
Copy link

uncomment #3net.ipv4.ip_forward=1 (remove the # symbol)

you have a 3 there that I think is an error

@scyto
Copy link
Author

scyto commented Sep 23, 2023

you have a 3 there that I think is an error

yup shift 3 = # i fixed it in the upper half missed that one, thanks.

@nihr43
Copy link

nihr43 commented Sep 23, 2023

You should be able to use the 'unnumbered' approach and forget the /30 addresses if you want. As in eno5 and eno6 are just 'up' with no address. Then in frr it should be something like ip ospf network point-to-point on each. Here's an example with bgp.

@scyto
Copy link
Author

scyto commented Sep 24, 2023

should

yes should is the operative issue :-)

What I found is that with OPSF (aka IPv4) doing that way and using point-to-point resulted in a network that routed when all links were up, but as soon as i did something like pull a cable or pull a node that not all surviving IPs could be reached

I found moving to point-to-multipoint and using individual IPs worked (though can't say which of the two changes was the fix).

In a dual stack one seems to need to have broadcast set on both ospf types or it breaks because IPv6 ospf doesn't support point-to-multipoint....

I also found that what you said worked with IPv6 on OSPFv3 using un-numbered and relying on the IPv6 loopbacks.
But tIPv4 flavour did not.

I can retry what you suggest for IPv4 now i know much more, not sure when though.

I am also tempted to use fabricd - but not sure why, lol.

@scyto
Copy link
Author

scyto commented Sep 24, 2023

@nihr43 i just removed the explicit IP addresses on pve1 en05 and en06 and routing to the loopback on pve1 breaks ... i don't know why...

as i said it doesn't work, if you can suggest a fix i would love it to work as it should, i don't have good mental models of how OSPF works, last time i did serious routing for work was in the 90's wit RIPv2 ;-)

(I should also point out that the packet pushers blog also says it doesn't work)

(is it something weird like the frr ospfd daemon has to bind to an IPv4 address to get the announcement packets?)

@scyto
Copy link
Author

scyto commented Sep 24, 2023

@scyto
Copy link
Author

scyto commented Sep 24, 2023

(is it something weird like the frr ospfd daemon has to bind to an IPv4 address to get the announcement packets?)

nope not that (though odd how the ss -l shows the proto as ???)

???   UNCONN 0      0         0.0.0.0:ospf                    0.0.0.0:*          
???   UNCONN 0      0         *:ospf                          *:*          
tcp   LISTEN 0      3         127.0.0.1:ospfd                   0.0.0.0:*          
tcp   LISTEN 0      3         [::1]:ospf6d                     [::]:*   

@scyto
Copy link
Author

scyto commented Sep 24, 2023

Yeah, the issue is lo doesn't see any neighbors, i don't know why....

pve3# show ip ospf interface
en05 is up
  ifindex 6, MTU 65520 bytes, BW 0 Mbit <UP,BROADCAST,RUNNING,MULTICAST>
  Internet Address 10.0.0.14/30, Broadcast 10.0.0.15, Area 0.0.0.0
  MTU mismatch detection: enabled
  Router ID 0.0.0.3, Network Type BROADCAST, Cost: 10
  Transmit Delay is 1 sec, State Backup, Priority 1
  Designated Router (ID) 0.0.0.2 Interface Address 10.0.0.13/30
  Backup Designated Router (ID) 0.0.0.3, Interface Address 10.0.0.14
  Multicast group memberships: OSPFAllRouters OSPFDesignatedRouters
  Timer intervals configured, Hello 10s, Dead 40s, Wait 40s, Retransmit 5
    Hello due in 1.486s
  Neighbor Count is 1, Adjacent neighbor count is 1
en06 is up
  ifindex 7, MTU 65520 bytes, BW 0 Mbit <UP,BROADCAST,RUNNING,MULTICAST>
  Internet Address 10.0.0.6/30, Broadcast 10.0.0.7, Area 0.0.0.0
  MTU mismatch detection: enabled
  Router ID 0.0.0.3, Network Type BROADCAST, Cost: 10
  Transmit Delay is 1 sec, State DR, Priority 1
  Designated Router (ID) 0.0.0.3 Interface Address 10.0.0.6/30
  Backup Designated Router (ID) 0.0.0.1, Interface Address 10.0.0.5
  Saved Network-LSA sequence number 0x80000002
  Multicast group memberships: OSPFAllRouters OSPFDesignatedRouters
  Timer intervals configured, Hello 10s, Dead 40s, Wait 40s, Retransmit 5
    Hello due in 1.486s
  Neighbor Count is 1, Adjacent neighbor count is 1
lo is up
  ifindex 1, MTU 65536 bytes, BW 0 Mbit <UP,LOOPBACK,RUNNING>
  Internet Address 10.0.0.83/32, Broadcast 10.0.0.83, Area 0.0.0.0
  MTU mismatch detection: enabled
  Router ID 0.0.0.3, Network Type LOOPBACK, Cost: 0
  Transmit Delay is 1 sec, State Loopback, Priority 1
  No backup designated router on this network
  Multicast group memberships: <None>
  Timer intervals configured, Hello 10s, Dead 40s, Wait 40s, Retransmit 5
    Hello due in inactive
  Neighbor Count is 0, Adjacent neighbor count is 0

@scyto
Copy link
Author

scyto commented Sep 24, 2023

oh i see why IPv6 works - the interface gives itself a link local, so it also has an explcit interface address!

pve3# show ipv6 ospf6 interface
en05 is up, type BROADCAST
  Interface ID: 6
  Internet Address:
    inet : 10.0.0.14/30
    inet6: fe80::a6:88ff:febf:3456/64
  Instance ID 0, Interface MTU 65520 (autodetect: 65520)
  MTU mismatch detection: enabled
  Area ID 0.0.0.0, Cost 10
  State BDR, Transmit Delay 1 sec, Priority 1
  Timer intervals configured:
   Hello 10(2.869), Dead 40, Retransmit 5
  DR: 0.0.0.2 BDR: 0.0.0.3
  Number of I/F scoped LSAs is 2
    0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
    0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
  Authentication Trailer is disabled

en06 is up, type BROADCAST
  Interface ID: 7
  Internet Address:
    inet : 10.0.0.6/30
    inet6: fe80::46:c2ff:feba:38ec/64
  Instance ID 0, Interface MTU 65520 (autodetect: 65520)
  MTU mismatch detection: enabled
  Area ID 0.0.0.0, Cost 10
  State DR, Transmit Delay 1 sec, Priority 1
  Timer intervals configured:
   Hello 10(2.869), Dead 40, Retransmit 5
  DR: 0.0.0.3 BDR: 0.0.0.1
  Number of I/F scoped LSAs is 2
    0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
    0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
  Authentication Trailer is disabled
enp86s0 is up, type BROADCAST


lo is up, type LOOPBACK
  Interface ID: 1
  Internet Address:
    inet : 10.0.0.83/32
    inet6: fc00::83/128
  Instance ID 0, Interface MTU 65536 (autodetect: 65536)
  MTU mismatch detection: enabled
  Area ID 0.0.0.0, Cost 10
  State Loopback, Transmit Delay 1 sec, Priority 1
  Timer intervals configured:
   Hello 10(-), Dead 40, Retransmit 5
  DR: 0.0.0.0 BDR: 0.0.0.0
  Number of I/F scoped LSAs is 0
    0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
    0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
  Authentication Trailer is disabled

so in both OSPFv2 and OSPFv3 are NOT unnumbered. Are you used to cisco? maybe this is a difference in implementation between FRR and cisco ios?

@scyto
Copy link
Author

scyto commented Sep 24, 2023

@nihr43

nope it doesn't work as numberless, using this config and no IPs on the en05 and en06 interfaces breaks ospfd routing completely

more than happy to try it another way if you have any suggestions on how to change the FRR ospf config?

root@pve3:~# vtysh

Hello, this is FRRouting (version 8.5.2).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

pve3# show running-config 
Building configuration...

Current configuration:
!
frr version 8.5.2
frr defaults traditional
hostname pve3
service integrated-vtysh-config
!
interface en05
 ip ospf area 0
 ip ospf network point-to-point
 ipv6 ospf6 area 0
 ipv6 ospf6 network broadcast
exit
!
interface en06
 ip ospf area 0
 ip ospf network point-to-point
 ipv6 ospf6 area 0
 ipv6 ospf6 network broadcast
exit
!
interface lo
 ip ospf area 0
 ipv6 ospf6 area 0
exit
!
router ospf
 ospf router-id 0.0.0.3
 log-adjacency-changes
exit
!
router ospf6
 ospf6 router-id 0.0.0.3
 log-adjacency-changes
 timers throttle spf 100 200 5000
exit
!
end
pve3# 

@nihr43
Copy link

nihr43 commented Sep 24, 2023

Oh my bad, apparently frr does it a bit weird - you don't remove the address altogether, you duplicate the /32 lo address.

Cumulus Linux describes this using FRR.
..and frr says:

'''
When configuring a point-to-point network on an interface and the interface has a /32 address associated with then OSPF will treat the interface as being unnumbered. If you are doing this you must set the net.ipv4.conf..rp_filter value to 0.
'''

@scyto
Copy link
Author

scyto commented Sep 24, 2023

Oh my bad

NP. you got me all excited, because it was a royal PITA to get the thunderbolt interfaces and IPs to remain on the same physical interface, lol :-)

Do you think i would have better luck going numberless if I switch to FRR OpenFabric (fabricd)?

@scyto
Copy link
Author

scyto commented Sep 24, 2023

like would this work?

!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
!
interface eth0
 ip router openfabric 1
 ipv6 router openfabric 1
!
interface eth1
 ip router openfabric 1
 ipv6 router openfabric 1
!
router openfabric 1
 net 49.0000.0000.0001.00

or does it need the IP addresses like show in the examples...

@scyto
Copy link
Author

scyto commented Sep 24, 2023

i will try it just for IPv4 and see....

@scyto
Copy link
Author

scyto commented Sep 24, 2023

yup that works, but the openfabric discovery time is soo slow

@scyto
Copy link
Author

scyto commented Sep 24, 2023

however switchover when pulling a link was really fast, when the link came back took maye 20 to 30 seconds to switch back

@scyto
Copy link
Author

scyto commented Sep 24, 2023

tl;dr

I now have IPv6 over OSFv3 and IPv4 over OpenFabric (fabricd)

tomorrow i will play with swithching over the IPv6 too....

pve3# show running-config 
Building configuration...

Current configuration:
!
frr version 8.5.2
frr defaults traditional
hostname pve3
service integrated-vtysh-config
!
interface en05
 ip router openfabric 1
 ipv6 ospf6 area 0
 ipv6 ospf6 network broadcast
exit
!
interface en06
 ip router openfabric 1
 ipv6 ospf6 area 0
 ipv6 ospf6 network broadcast
exit
!
interface lo
 ip router openfabric 1
 ipv6 ospf6 area 0
 openfabric passive
exit
!
router ospf6
 ospf6 router-id 0.0.0.3
 log-adjacency-changes
 timers throttle spf 100 200 5000
exit
!
router openfabric 1
 net 49.0000.0000.0003.00
exit
!

and

pve3# show openfabric topology 
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
pve3                                                                  
10.0.0.83/32         IP internal  0                                     pve3(4)
pve2                 TE-IS        10     pve2                 en05      pve3(4)
pve1                 TE-IS        10     pve1                 en06      pve3(4)
10.0.0.82/32         IP TE        20     pve2                 en05      pve2(4)
10.0.0.81/32         IP TE        20     pve1                 en06      pve1(4)

and

root@pve3:~# cat /etc/network/interfaces
...
auto lo
iface lo inet loopback

auto lo:0
iface lo:0 inet static
        address 10.0.0.83/32

auto lo:6
iface lo:6 inet static
        address fc00::83/128

iface enp86s0 inet manual

auto en05
iface en05 inet manual
        mtu 65520

auto en06
iface en06 inet manual
        mtu 65520

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.83/24
        gateway 192.168.1.1
        bridge-ports enp86s0
        bridge-stp off
        bridge-fd 0

once i have IPv6 working i will write a new gist to supersede this one as if this works well it is way simpler for folks....

@scyto
Copy link
Author

scyto commented Sep 24, 2023

bah couldn't leave it alone... this looks good (i still have OSPFv3 configured too)

pve3# show openfabric topology 
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
pve3                                                                  
10.0.0.83/32         IP internal  0                                     pve3(4)
pve2                 TE-IS        10     pve2                 en05      pve3(4)
pve1                 TE-IS        10     pve1                 en06      pve3(4)
10.0.0.82/32         IP TE        20     pve2                 en05      pve2(4)
10.0.0.81/32         IP TE        20     pve1                 en06      pve1(4)

IS-IS paths to level-2 routers that speak IPv6
Vertex               Type         Metric Next-Hop             Interface Parent
pve3                                                                  
fc00::83/128         IP6 internal 0                                     pve3(4)
pve2                 TE-IS        10     pve2                 en05      pve3(4)
pve1                 TE-IS        10     pve1                 en06      pve3(4)
fc00::82/128         IP6 internal 20     pve2                 en05      pve2(4)
fc00::81/128         IP6 internal 20     pve1                 en06      pve1(4)

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent

any reason NOT to use openfabric?

@scyto
Copy link
Author

scyto commented Sep 24, 2023

ok, does this seem a more elegant approach? numberless interfaces and simple frr config?

pve3# show running-config 
Building configuration...

Current configuration:
!
frr version 8.5.2
frr defaults traditional
hostname pve3
service integrated-vtysh-config
!
interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!
interface en06
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0000.0000.0003.00
exit
!
end

@scyto
Copy link
Author

scyto commented Sep 24, 2023

ok i have now deprecate this gist
it is now replaced with https://gist.github.com/scyto/4c664734535da122f4ab2951b22b2085

@nihr43
Copy link

nihr43 commented Sep 24, 2023

I have no familiarity with openfabric - but if its working go for it.
On slow convergence - frr has a 'datacenter' profile thats supposed to use more aggressive timers. You enable it with frr defaults datacenter at the top of frr.conf. I don't know what its affect will be with openfabric but for me with bgp convergence is pretty much instant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment