-
-
Save jprenken/18ca7bf14ddae547ae0fdf6f56d72573 to your computer and use it in GitHub Desktop.
#!/usr/local/bin/php | |
<?php | |
/* | |
* Copyright (C) 2004 Scott Ullrich <[email protected]> | |
* All rights reserved. | |
* | |
* Redistribution and use in source and binary forms, with or without | |
* modification, are permitted provided that the following conditions are met: | |
* | |
* 1. Redistributions of source code must retain the above copyright notice, | |
* this list of conditions and the following disclaimer. | |
* | |
* 2. Redistributions in binary form must reproduce the above copyright | |
* notice, this list of conditions and the following disclaimer in the | |
* documentation and/or other materials provided with the distribution. | |
* | |
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, | |
* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY | |
* AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE | |
* AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, | |
* OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | |
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | |
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | |
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | |
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | |
* POSSIBILITY OF SUCH DAMAGE. | |
*/ | |
require_once("config.inc"); | |
require_once("interfaces.inc"); | |
require_once("util.inc"); | |
$subsystem = !empty($argv[1]) ? $argv[1] : ''; | |
$type = !empty($argv[2]) ? $argv[2] : ''; | |
if ($type != 'MASTER' && $type != 'BACKUP') { | |
log_error("Carp '$type' event unknown from source '{$subsystem}'"); | |
exit(1); | |
} | |
if (!strstr($subsystem, '@')) { | |
log_error("Carp '$type' event triggered from wrong source '{$subsystem}'"); | |
exit(1); | |
} | |
if ($type === "MASTER") { | |
log_error("Enabling WireGuard due to CARP event '$type'"); | |
# Checking `isset` avoids a race condition during startup when the | |
# WireGuard config stanza seems like it's not yet loaded. Without it, this | |
# can create an extra, empty, invalid stanza that breaks WireGuard. | |
if (isset($config['OPNsense']['wireguard']['general']['enabled'])) { | |
$config['OPNsense']['wireguard']['general']['enabled'] = '1'; | |
} | |
configd_run('wireguard start'); | |
write_config("Enable WireGuard due to CARP event '$type'", false); | |
} else { | |
log_error("Disabling WireGuard due to CARP event '$type'"); | |
configd_run('wireguard stop'); | |
if (isset($config['OPNsense']['wireguard']['general']['enabled'])) { | |
$config['OPNsense']['wireguard']['general']['enabled'] = '0'; | |
} | |
write_config("Disable WireGuard due to CARP event '$type'", false); | |
} |
Any possible solutions there?
The only solution I've come up with is the one I linked a few comments up: https://gist.github.com/jprenken/18ca7bf14ddae547ae0fdf6f56d72573?permalink_comment_id=4309559#gistcomment-4309559
I use a cron job which runs every minute. It seems to solve the problem for me.
just out of curiosity:
if this script is enabling wireguard after MASTER / BACKUP change, but one needs to click on "apply" in the GUI to get wireguard running and the other script running as cron every minute is working "out of the box"... has someone tried to add this simple line configd_run('wireguard start');
twice? perhaps the cron job is "applying" the config on the second run.
has someone tried to add this simple line
configd_run('wireguard start');
twice? perhaps the cron job is "applying" the config on the second run.
I tried your suggestion and it works when I add a sleep(1);
between the two calls of configd_run('wireguard start');
! I guess that OPNsense needs some time to update things on the backend before it can actually activate the tunnels.
@jprenken do you have more insights why the sleep is needed here?
Unfortunately, no; I don't understand much about OPNsense internals. I would love for this to get properly upstreamed into the project, but don't have the time to get up to speed and propose it.
I tried both approaches - the CARP hook and the CRON job - but both did not work.
The handover of the WG IP and the start/stop of the WG service seemed to work but the backup firewall did not take over the peers.
When switching back, the master firewall started the WG service and immediately connected with the peers.
After some packet and log analysis I found out that I had a combination of two problems:
- the stupid MAC/name/IP assignment of the Fritz!Box that resisted to communicate with the virtual IP of the WAN interface. The solution was to add a new client to the Fritz!Box network configuration with a fresh (not yet used) IP address and then configure the port forwarding.
- the sleep(1) from the comment of @Hobby-Student really seems to be necessary. With the 1 sec delay and the second configd run, the peers perfectly switch from master to backup and back.
The CRON job from @taxilian may also work, but the CARP hook is my favorite.
Good morning jprenken. I have made a fork of your code and made change which in my testing have made the HA CARP fail-over for WireGuard more reliable.
I invite you to inspect my fork and if you agree add these changes to your script.
https://gist.github.com/nzkiwi68/5b54aece233ff72ada395b5a1bdad92c
I would have done a formal pull request, but it seems although GitHub allows a Gist to be forked, I cannot make a pull request against a Gist, only a conventional GitHub repository.
Good morning! Your variant, nzkiwi68, unfortunately did not work for us. I have not analyzed it further, but included the "sleep" part in this gist here. We have many interfaces, one second was obviously not enough here either
/usr/local/etc/rc.bootup: Unable to configure nonexistent interface opt18 (wg0)
but at 3 seconds the switch seems to run reliably so far!
Thanks for that.
I really have come to the conclusion the answer lies not in the CARP syshook and running the start multiple times with the sleep statement, but, debugging and fixing the actual wireguard start command that actually gets run.
See my new forum post and franco's reply:
https://forum.opnsense.org/index.php?topic=31962.0
Franco talks about:
In a nutshell it just calls
# /usr/local/etc/rc.d/wireguard start
and does whatever the RC system deems appropriate. No clue what's wrong in your cause, but I do know WireGuard doesn't make itself any easier to debug experimental or not.
glad to see, that the sleep and calling configd_run twice seems to help. It seems, that there is a race condition between enabling wireguard, getting things ready (opnsense site) and starting wireguard. I'm not familiar with the opnsense internas and what the $config['OPNsense']['wireguard']['general']['enabled'] = '1';
is starting. If this is just writing "1" to the config.xml nothing is loaded. If then configd_run('wireguard start');
is running, it takes the change of "wireguard enabled" in the config.xml and informs the actual running system. While the system is doing this change, it can't start wireguard, because this first attempt relies on "wireguard disabled". The sleep gives the system (depending on the used hardware) enough time to end all calls on the system and the second configd_run('wireguard start');
finally starts wireguard, because wireguard is enabled in the config.xml.
I don't know when I can take a deeper look, but I would think of:
-> $config['OPNsense']['wireguard']['general']['enabled'] = '1';
-> "reload settings" of opnsense and wait for completion
-> configd_run('wireguard start');
If anyone is interested, I put in this PR opnsense/plugins#3299. I think it checks all the boxes folks are talking about here. Worst case it is rejected, but feel free to test.
Derp, good catch! Fixed.
It seems the script broke with opnsense/plugins@86c9e5c
The configd run now requires to give the wireguard instance as parameter.
So if you upgrade to 23.7.3 it breaks.
The following works:
$servers = (new \OPNsense\Wireguard\Server())->servers->server->iterateItems();
foreach ($servers as $key => $node) {
if (!empty((string)$node->enabled)) {
$backend->configdRun("wireguard start {$key}");
}
}
// repeat for stop
This is now unnecessary as proper CARP support is now built into OPNsense with WireGuard since OPNsense 23.7.8 released 09 Nov 2023 and further improved in the latest OPNsense firmware.
The WireGuard follow CARP implementation by the OPNsense dev team is excellent and it works really well!
We got the same probleme here.
Enabling and Disabling works great, but we have to click "apply" manualy in the WireguardGUI every time an failover happens. Without that the Wireguard Interfaces wont startup.
This is a big mess because thats a must feature for HA.
Any possible solutions there?