Skip to content

Instantly share code, notes, and snippets.

@spali
Last active November 19, 2024 13:09
Show Gist options
  • Save spali/2da4f23e488219504b2ada12ac59a7dc to your computer and use it in GitHub Desktop.
Save spali/2da4f23e488219504b2ada12ac59a7dc to your computer and use it in GitHub Desktop.
Disable WAN Interface on CARP Backup
#!/usr/local/bin/php
<?php
require_once("config.inc");
require_once("interfaces.inc");
require_once("util.inc");
$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';
if ($type != 'MASTER' && $type != 'BACKUP') {
log_error("Carp '$type' event unknown from source '{$subsystem}'");
exit(1);
}
if (!strstr($subsystem, '@')) {
log_error("Carp '$type' event triggered from wrong source '{$subsystem}'");
exit(1);
}
$ifkey = 'wan';
if ($type === "MASTER") {
log_error("enable interface '$ifkey' due CARP event '$type'");
$config['interfaces'][$ifkey]['enable'] = '1';
write_config("enable interface '$ifkey' due CARP event '$type'", false);
interface_configure(false, $ifkey, false, false);
} else {
log_error("disable interface '$ifkey' due CARP event '$type'");
unset($config['interfaces'][$ifkey]['enable']);
write_config("disable interface '$ifkey' due CARP event '$type'", false);
interface_configure(false, $ifkey, false, false);
}
@willjasen
Copy link

willjasen commented Sep 15, 2024

I’m throwing this here with little knowledge otherwise with my abandoned script, but a challenge I had to overcome dealt with multiple interfaces being decided as “failed” such that the backup connection would take over. May not be relevant now with the recent updates but throwing it out there - https://gist.github.com/willjasen/6ae0f47bca36ced2bd52b2fefc2bc21e

@raegedoc
Copy link

raegedoc commented Sep 15, 2024

Hi @raegedoc are you sure that you you use this gist? There ist only an else case line 28 to 33 - which should used, if the system is in the Backup case... or are you using this gist? There is explicit an Backup Section.

Perhaps you could post or do a fork of this Script?

Hi @skl283, I tried them all from 2 weeks ago and none was giving me back internet access on my backup node after being promoted primary and demoted back to backup again. Only these small adds would fix it all while keeping the script very light and clean.

I forgot to mention I incorporated the suggestions @edward-scroop did in the post previous to mine : https://gist.github.com/spali/2da4f23e488219504b2ada12ac59a7dc?permalink_comment_id=5185710#gistcomment-5185710

Here is a link to my gist : https://gist.github.com/raegedoc/093ba815b6b3f2bc2ff327f48c60f3a9

Open to your ideas :)

@edward-scroop
Copy link

@raegedoc do you have the gateway monitoring setup for the WAN gateway? Because I have it set up and when it switches back to master, it sets the priority of the backup WAN gateway to defunct which removes it from the route selection.

@raegedoc
Copy link

raegedoc commented Sep 15, 2024

@edward-scroop, Yes I have gateway monitoring set for my WAN gateway of both primary and backup. The problem is not with my primary node switching back to master but my backup node switching back to being a backup. This way, backup has internet access for receiving it OPNsense updates and news Annoncements

For clarity, here is my primary configuration for the WAN link when primary is primary and backup is backup :

image

...and for my backup configuration. Blue arrow point to the fields where MY_CARP_LAN_VIP is specifed.

image
image

@edward-scroop
Copy link

edward-scroop commented Sep 15, 2024

From your screenshots, the monitor ip is empty and the disable gateway monitoring is checked. That would mean gateway monitoring is disabled.

I think what is happening is as your WAN gateway has a higher priority than the LAN gateway and with no gateway monitoring, the backup has no way to tell the WAN gateway is down and it then doesn't have a reason to swap to the LAN gateway.

To fix it either set the LAN gateway to a priority higher than the WAN gateway, or set a monitor ip of 1.1.1.1 and uncheck the disable gateway monitoring box.

@raegedoc
Copy link

raegedoc commented Sep 15, 2024

Hi, WAN Gateway has priority 254 and WAN-to-LAN has 255 (so WAN > WAN-to-LAN).

Anyway, I tried your trick and worse, my backup has no internet access when backup. Default route has shown still point default gateway to the WAN IP that connects to nothing when backup.

image

Interfaces: Diagnostics: Ping to 1.1.1.1 has 100% loss :(

Since fixing the default gateway (with route delete followed by add CARP_LAN_IP) while being backup of a functional primary node, it might have been the missing trick with my setup that is pretty standard when theISP provided only a public DHCP WAN IP.

I'll keep the setup I shared earlier. Thank's for sharing edward-scroop.

@edward-scroop
Copy link

edward-scroop commented Sep 16, 2024

The LAN gateway needs a priority higher than 254. The smaller the value, the higher the priority.

@raegedoc
Copy link

The LAN gateway needs a priority higher than 254. The smaller the value, the higher the priority.

It's the case, LAN has priority 255

@edward-scroop
Copy link

I meant, the LAN needs a priority of 1-253.

@bitcoredotorg
Copy link

I upgraded to 24.7.6 today, and our syshook.d scripts that call interface_configure() appears to now crash when an undefined function eventually is called (see my stack trace below). See my post on opnsense forums: https://forum.opnsense.org/index.php?topic=20972.msg216770#msg216770 for the customizations I run, but I'd imagine Spali's version is equally as affected. I submitted a crash report, but did not create an issue on the opnsense github.

I believe we need to be using a more well supported method to enable/disable interfaces in these syshook scripts. The 'interface' PHP functions seem to be in heavy development in 24.7, and many functions seem to be considered 'legacy' methods or becoming deprecated. Or, perhaps this is just a bug.

As a workaround, if you don't want to roll-back, you can comment the $config line, write_config, and interface_configure calls and instead use shell_exec("/sbin/ifconfig {$interface['if']} up"); and shell_exec("/sbin/ifconfig {$interface['if']} down"); instead, but this is less reliable and has other undesirable effects. For example, when only using interface up/down commands, the backup device needs it's WAN interface left as enabled - under that condition, in the event of a reboot, you'll want to manually trigger a failover cycle to have the backup device's WAN interface in "down" state, else you'll have both interfaces up and enabled. Again, we need to find the most well supported way to enable/disable interfaces, and go from there.

[22-Oct-2024 13:17:14 America/New_York] PHP Fatal error: Uncaught Error: Call to undefined function system_routing_configure() in /usr/local/etc/inc/interfaces.inc:3777
Stack trace:
#0 /usr/local/etc/inc/interfaces.inc(2498): interfaces_restart_by_device(false, Array, false)
#1 /usr/local/etc/rc.syshook.d/carp/10-wancarp(24): interface_configure(false, 'opt3', false, false)
#2 {main}
thrown in /usr/local/etc/inc/interfaces.inc on line 3777

As a side note, others are having trouble with carp maintenance mode not working at all (not triggering a failover, as one would expect): opnsense/core#7877

@toddgonzo74
Copy link

Anyone find a fix for this issue yet?

@skl283
Copy link

skl283 commented Nov 7, 2024

i haven't tried it yet, but does this issue also occur at 24.7.8? @bitcoredotorg perhaps you tried the update?

@toddgonzo74
Copy link

I just upgraded to 24.7.8 (I was actually on 24.7.7 and it was working fine... as was it in 24.7.6). I run both my firewalls in Proxmox, so I took a backup snapshot before each upgrade, just in case. When the primary node came back up, the only thing I noticed was that it was pinned up in persistent carp maintenance mode.. I enabled and disabled and the backup failed right over to the primary. Only issue I still have is with Spectrum. For some reason, when I use a vlan on my managed switch (Juniper EX3400 POE), the Spectrum routinely fails to DHCP a new address (I have dhcp snooping and damn near everything else disabled in that vlan that could be interfering). For a goof, I grabbed an old gig switch from Netgear and plugged in the Spectrum primary/backup and circuit.. been fine for 4 months now. Fails over Spectrum with no issues.

Anyway... not seeing the problem in 24.7.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment