Skip to content

Instantly share code, notes, and snippets.

@PhrozenByte
Last active October 12, 2025 19:50
Show Gist options
  • Save PhrozenByte/4418f5cde6bb687b064ace7a256abefe to your computer and use it in GitHub Desktop.
Save PhrozenByte/4418f5cde6bb687b064ace7a256abefe to your computer and use it in GitHub Desktop.
Acquires and holds a mutex lock until terminated, designed to implement "mutex daemons" for Systemd.
#!/bin/bash
# systemd-mutex.sh
# Acquires and holds a mutex lock until terminated
#
# This script acquires and holds an exclusive (pass one of `--exclusive`, `-e`,
# or `-x`; default) or shared (pass `--shared` or `-s`) `flock` lock on the
# given file (pass required `MUTEX_FILE`) until terminated. If the lock cannot
# be acquired immediately, it either waits indefinitely until it can, or until
# the given timeout is exceeded (pass optional float `TIMEOUT` in seconds).
# Optionally, pass `-v` or `--verbose` to explain what's being done.
#
# This script implements a "mutex daemon": As long as the daemon runs, it holds
# a lock. Any other process trying to acquire the same lock either fails or
# blocks until the daemon is terminated or temporarily releases the lock. The
# script was designed to implement Systemd mutex service units; see below.
#
# The script first opens a read-only file descriptor for `MUTEX_FILE`. Note
# that `MUTEX_FILE` doesn't have to be a regular file - directories work too.
# If `MUTEX_FILE` doesn't exist yet, a regular file is created (consider a path
# on a tmpfs). The script then acquires a lock on that file descriptor using
# `flock` (see `flock(1)`). `flock` blocks until the lock is acquired or the
# given timeout is exceeded. After acquiring the lock, the script sleeps
# indefinitely until terminated by a signal.
#
# If the script receives SIGHUP, it temporarily releases the lock and tries to
# re-acquire it. If a timeout was given initially, it is applied again. If the
# script hasn't acquired the lock yet, the given timeout is extended to fully
# apply again. If the script receives SIGINT, SIGQUIT, or SIGTERM, it releases
# the lock and exits with code 0. If the lock hasn't yet been acquired, `flock`
# is interrupted and the script exits with code 0. If SIGABRT is received, the
# script also releases the lock resp. interrupts `flock`, but then exits with
# code 134 instead. If acquiring the lock exceeds the given timeout, the script
# exits with code 254. In case of an error, the script exits with a non-zero
# code.
#
# If the `NOTIFY_SOCKET` environment variable is set, the script interacts with
# Systemd PID1 using the `sd_notify(3)` interface. It sends human-readable
# statuses when first acquiring the lock, when reloading (due to SIGHUP; also
# sends the `RELOADING=1` assignment), when stopping (SIGINT/QUIT/TERM/ABRT;
# also sends the `STOPPING=1` assignment), and, most importantly, when the lock
# was acquired (sends the `READY=1` assignment). If acquiring the lock isn't
# possible right away, it sends `EXTEND_TIMEOUT_USEC` assignments to tell
# Systemd PID1 to extend its start timeout repeatedly for 5 minutes, until the
# timeout given to the script is exceeded. This means that if one calls this
# script as part of a `Type=notify-reload` Systemd service unit, Systemd won't
# consider the "mutex daemon" running until the script has successfully
# acquired the lock. Other Systemd units referencing the "mutex daemon" with
# `After=` and `Upholds=` consequently won't start until the lock is acquired.
# The "mutex daemon" itself should be a templated Systemd service unit with
# `StopWhenUnneeded=true`: Multiple services referencing the same instance name
# share the same lock, whereas services with different instance names must wait
# for each other.
#
# Take the following example: The Systemd service units `job-a.service` and
# `job-b.service` can be queued independently - even when the other job is
# currently running - yet they strictly run one after another and defer startup
# as needed. When grouping multiple Systemd service units with a Systemd target
# unit using `StopWhenUnneeded=true`, you will likely want to order the target
# unit `Before=` the "mutex daemon" and let the target unit `Upholds=` it. The
# individual service units that `Upholds=` the target unit must still be
# ordered `After=` the "mutex daemon" though.
#
# : [email protected]
# ```
# [Unit]
# Description=Job Mutex
# StopWhenUnneeded=true
# RefuseManualStart=true
# RefuseManualStop=true
#
# [Service]
# Type=notify-reload
# ExecStart=/usr/local/bin/systemd-mutex.sh /run/lock/job-mutex.lock
# ```
#
# : job-a.service
# ```
# [Unit]
# Description=Job A
# [email protected]
# [email protected]
#
# [Service]
# Type=oneshot
# ExecStart=/usr/bin/sleep 60
# ```
#
# : job-b.service
# ```
# [Unit]
# Description=Job B
# [email protected]
# [email protected]
#
# [Service]
# Type=oneshot
# ExecStart=/usr/bin/sleep 60
# ```
#
# Copyright (C) 2025 Daniel Rudolf (<https://www.daniel-rudolf.de>)
# License: The MIT License <http://opensource.org/licenses/MIT>
#
# SPDX-License-Identifier: MIT
set -eu -o pipefail
export LC_ALL=C.UTF-8
[ -x "$(which awk 2> /dev/null)" ] || { echo "Missing script dependency: awk" >&2; exit 1; }
[ -x "$(which flock 2> /dev/null)" ] || { echo "Missing script dependency: flock" >&2; exit 1; }
[ -x "$(which timeout 2> /dev/null)" ] || { echo "Missing script dependency: timeout" >&2; exit 1; }
if [ -n "${NOTIFY_SOCKET:-}" ]; then
[ -x "$(which systemd-notify 2> /dev/null)" ] || { echo "Missing script dependency: systemd-notify" >&2; exit 1; }
fi
# helper functions
print_usage() {
echo "Usage:"
echo " $(basename "$0") [-v|--verbose] [-e|-x|--exclusive] [-s|--shared] \\"
echo " MUTEX_FILE [TIMEOUT]"
}
quote() {
local QUOTED=
for ARG in "$@"; do
[ "$(printf '%q' "$ARG")" == "$ARG" ] \
&& QUOTED+=" $ARG" \
|| QUOTED+=" ${ARG@Q}"
done
echo "${QUOTED:1}"
}
cmd() {
[ -z "$VERBOSE" ] || echo + "$(quote "$@")" >&2
"$@"
}
# app functions
now() {
awk -v o="${1:-0}" '{ printf "%d\n", ($1 + o) * 1000000 }' /proc/uptime
}
wake_up() {
# kill `flock` if currently trying to acquire the lock
if [ -n "$FLOCK_PID" ]; then
cmd kill "$FLOCK_PID" 2> /dev/null || true
FLOCK_PID=
fi
# kill `sleep` if lock was acquired before
if [ -n "$SLEEP_PID" ]; then
cmd kill "$SLEEP_PID" 2> /dev/null || true
SLEEP_PID=
fi
}
notify() {
local MESSAGE="$1"
shift
[ -z "${NOTIFY_SOCKET:-}" ] || cmd systemd-notify --status="$MESSAGE" "$@" || true
echo "$MESSAGE"
}
lock_identity() {
local TYPE="exclusive"
[ -z "$LOCK_SHARED" ] || TYPE="shared"
[ $# -eq 0 ] || [ "$1" == "${1,,}" ] || TYPE="${TYPE^}"
echo "$TYPE lock ${LOCK_FILE@Q}"
}
# script traps
trap_exit() {
EXIT=$?
[[ " 0 129 130 131 143 " != *" $EXIT "* ]] || EXIT=0
# wake up for good
wake_up
# cleanup traps
trap - HUP INT QUIT ABRT TERM EXIT
}
trap_reload() {
# update timeout
[ -z "$TIMEOUT" ] || EXIT_AFTER="$(now "$TIMEOUT")"
if [ -n "$LOCKED" ]; then
# notify Systemd that we're cycling the lock
notify "Releasing and re-acquiring $(lock_identity)..." --reloading MONOTONIC_USEC="$(now)"
# temporarily release lock
cmd flock -u 9
else
# do nothing if not holding the lock, just notify Systemd
notify "Reloading without holding $(lock_identity)..." --reloading MONOTONIC_USEC="$(now)"
fi
# wake up to reschedule
wake_up
}
# read parameters and setup script
LOCK_FILE=
TIMEOUT=
LOCK_SHARED=
VERBOSE=
while [ $# -gt 0 ]; do
if [[ "$1" =~ ^-[a-zA-Z0-9]{2,}$ ]]; then
set -- $(echo "${1:1}" | sed 's/./-& /g') "${@:2}"
continue
fi
if [ "$1" == "--help" ]; then
print_usage
exit 0
elif [ "$1" == "-e" ] || [ "$1" == "-x" ] || [ "$1" == "--exclusive" ]; then
LOCK_SHARED=
elif [ "$1" == "-s" ] || [ "$1" == "--shared" ]; then
LOCK_SHARED=y
elif [ "$1" == "-v" ] || [ "$1" == "--verbose" ]; then
VERBOSE=y
elif [ -z "$LOCK_FILE" ]; then
LOCK_FILE="$1"
elif [ -z "$TIMEOUT" ]; then
TIMEOUT="$1"
else
echo "Unknown option: $1" >&2
exit 1
fi
shift
done
if [ -z "$LOCK_FILE" ]; then
print_usage >&2
exit 1
fi
EXIT=
EXIT_AFTER=
if [[ "$TIMEOUT" =~ ^[0-9]+(\.[0-9]+)?$ ]]; then
EXIT_AFTER="$(now "$TIMEOUT")"
elif [ -n "$TIMEOUT" ]; then
echo "Invalid value for 'TIMEOUT': $TIMEOUT" >&2
exit 1
fi
FLOCK_PID=
SLEEP_PID=
LOCKED=
trap trap_exit INT QUIT ABRT TERM EXIT
trap trap_reload HUP
# notify Systemd that we're trying to acquire the lock
notify "Acquiring $(lock_identity)..."
# prepare lock by opening a r/o file descriptor
[ -e "$LOCK_FILE" ] || cmd touch "$LOCK_FILE"
[ -z "$VERBOSE" ] || echo + "exec 9< ${LOCK_FILE@Q}" >&2
exec 9< "$LOCK_FILE"
# keep lock until terminated
while [ -z "$EXIT" ]; do
NOW="$(now)"
LOCKED=
# check whether we can acquire the lock right away
if ! cmd flock ${LOCK_SHARED:+-s} -n 9; then
# wait until the timeout is exceeded, or a maximum of 5min
# if the timeout is exceeded, break the loop and exit, or just reschedule otherwise
WAIT_FOR=299999999
if [ -n "$EXIT_AFTER" ]; then
(( NOW < EXIT_AFTER )) || { EXIT=254; break; }
(( NOW + WAIT_FOR <= EXIT_AFTER )) || WAIT_FOR="$(( EXIT_AFTER - NOW - 1 ))"
fi
# extend Systemd start timeout (+ 10sec for script to exit gracefully)
[ -z "${NOTIFY_SOCKET:-}" ] || cmd systemd-notify EXTEND_TIMEOUT_USEC="$(( $WAIT_FOR + 10000001 ))"
# acquire lock
# due to limited accuracy `flock` might wait <1sec longer than expected
cmd timeout -k1 "$(( WAIT_FOR / 1000000 + 1 ))" flock ${LOCK_SHARED:+-s} 9 &
FLOCK_PID=$!
wait $FLOCK_PID || true
FLOCK_PID=
# `flock` can quit prematurely for many reasons (e.g. when receiving a SIGHUP),
# so reschedule to check whether the lock was acquired; if it was, it will still be
continue
fi
# notify Systemd that the mutex is ready
notify "$(lock_identity X) acquired" --ready
LOCKED=y
# sleep until terminated by external signal
cmd sleep infinity &
SLEEP_PID=$!
wait $SLEEP_PID || true
SLEEP_PID=
done
# notify Systemd that we're stopping
EXIT_INFO="gracefully"
[ "${EXIT:-0}" -eq 0 ] || EXIT_INFO="non-gracefully"
[ -n "$LOCKED" ] \
&& notify "Releasing $(lock_identity) and stopping $EXIT_INFO..." --stopping \
|| { [ "$EXIT" -eq 254 ] \
&& notify "Acquiring $(lock_identity) timed out, stopping..." --stopping \
|| notify "Stopping $EXIT_INFO without holding $(lock_identity)..." --stopping; }
# close file descriptor (which also releases a possible lock) and exit
[ -z "$VERBOSE" ] || echo + "exec 9<&-" >&2
exec 9<&-
exit "$EXIT"
@Firefishy
Copy link

Very nice! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment