Last active
October 12, 2025 19:50
-
-
Save PhrozenByte/4418f5cde6bb687b064ace7a256abefe to your computer and use it in GitHub Desktop.
Acquires and holds a mutex lock until terminated, designed to implement "mutex daemons" for Systemd.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| # systemd-mutex.sh | |
| # Acquires and holds a mutex lock until terminated | |
| # | |
| # This script acquires and holds an exclusive (pass one of `--exclusive`, `-e`, | |
| # or `-x`; default) or shared (pass `--shared` or `-s`) `flock` lock on the | |
| # given file (pass required `MUTEX_FILE`) until terminated. If the lock cannot | |
| # be acquired immediately, it either waits indefinitely until it can, or until | |
| # the given timeout is exceeded (pass optional float `TIMEOUT` in seconds). | |
| # Optionally, pass `-v` or `--verbose` to explain what's being done. | |
| # | |
| # This script implements a "mutex daemon": As long as the daemon runs, it holds | |
| # a lock. Any other process trying to acquire the same lock either fails or | |
| # blocks until the daemon is terminated or temporarily releases the lock. The | |
| # script was designed to implement Systemd mutex service units; see below. | |
| # | |
| # The script first opens a read-only file descriptor for `MUTEX_FILE`. Note | |
| # that `MUTEX_FILE` doesn't have to be a regular file - directories work too. | |
| # If `MUTEX_FILE` doesn't exist yet, a regular file is created (consider a path | |
| # on a tmpfs). The script then acquires a lock on that file descriptor using | |
| # `flock` (see `flock(1)`). `flock` blocks until the lock is acquired or the | |
| # given timeout is exceeded. After acquiring the lock, the script sleeps | |
| # indefinitely until terminated by a signal. | |
| # | |
| # If the script receives SIGHUP, it temporarily releases the lock and tries to | |
| # re-acquire it. If a timeout was given initially, it is applied again. If the | |
| # script hasn't acquired the lock yet, the given timeout is extended to fully | |
| # apply again. If the script receives SIGINT, SIGQUIT, or SIGTERM, it releases | |
| # the lock and exits with code 0. If the lock hasn't yet been acquired, `flock` | |
| # is interrupted and the script exits with code 0. If SIGABRT is received, the | |
| # script also releases the lock resp. interrupts `flock`, but then exits with | |
| # code 134 instead. If acquiring the lock exceeds the given timeout, the script | |
| # exits with code 254. In case of an error, the script exits with a non-zero | |
| # code. | |
| # | |
| # If the `NOTIFY_SOCKET` environment variable is set, the script interacts with | |
| # Systemd PID1 using the `sd_notify(3)` interface. It sends human-readable | |
| # statuses when first acquiring the lock, when reloading (due to SIGHUP; also | |
| # sends the `RELOADING=1` assignment), when stopping (SIGINT/QUIT/TERM/ABRT; | |
| # also sends the `STOPPING=1` assignment), and, most importantly, when the lock | |
| # was acquired (sends the `READY=1` assignment). If acquiring the lock isn't | |
| # possible right away, it sends `EXTEND_TIMEOUT_USEC` assignments to tell | |
| # Systemd PID1 to extend its start timeout repeatedly for 5 minutes, until the | |
| # timeout given to the script is exceeded. This means that if one calls this | |
| # script as part of a `Type=notify-reload` Systemd service unit, Systemd won't | |
| # consider the "mutex daemon" running until the script has successfully | |
| # acquired the lock. Other Systemd units referencing the "mutex daemon" with | |
| # `After=` and `Upholds=` consequently won't start until the lock is acquired. | |
| # The "mutex daemon" itself should be a templated Systemd service unit with | |
| # `StopWhenUnneeded=true`: Multiple services referencing the same instance name | |
| # share the same lock, whereas services with different instance names must wait | |
| # for each other. | |
| # | |
| # Take the following example: The Systemd service units `job-a.service` and | |
| # `job-b.service` can be queued independently - even when the other job is | |
| # currently running - yet they strictly run one after another and defer startup | |
| # as needed. When grouping multiple Systemd service units with a Systemd target | |
| # unit using `StopWhenUnneeded=true`, you will likely want to order the target | |
| # unit `Before=` the "mutex daemon" and let the target unit `Upholds=` it. The | |
| # individual service units that `Upholds=` the target unit must still be | |
| # ordered `After=` the "mutex daemon" though. | |
| # | |
| # : [email protected] | |
| # ``` | |
| # [Unit] | |
| # Description=Job Mutex | |
| # StopWhenUnneeded=true | |
| # RefuseManualStart=true | |
| # RefuseManualStop=true | |
| # | |
| # [Service] | |
| # Type=notify-reload | |
| # ExecStart=/usr/local/bin/systemd-mutex.sh /run/lock/job-mutex.lock | |
| # ``` | |
| # | |
| # : job-a.service | |
| # ``` | |
| # [Unit] | |
| # Description=Job A | |
| # [email protected] | |
| # [email protected] | |
| # | |
| # [Service] | |
| # Type=oneshot | |
| # ExecStart=/usr/bin/sleep 60 | |
| # ``` | |
| # | |
| # : job-b.service | |
| # ``` | |
| # [Unit] | |
| # Description=Job B | |
| # [email protected] | |
| # [email protected] | |
| # | |
| # [Service] | |
| # Type=oneshot | |
| # ExecStart=/usr/bin/sleep 60 | |
| # ``` | |
| # | |
| # Copyright (C) 2025 Daniel Rudolf (<https://www.daniel-rudolf.de>) | |
| # License: The MIT License <http://opensource.org/licenses/MIT> | |
| # | |
| # SPDX-License-Identifier: MIT | |
| set -eu -o pipefail | |
| export LC_ALL=C.UTF-8 | |
| [ -x "$(which awk 2> /dev/null)" ] || { echo "Missing script dependency: awk" >&2; exit 1; } | |
| [ -x "$(which flock 2> /dev/null)" ] || { echo "Missing script dependency: flock" >&2; exit 1; } | |
| [ -x "$(which timeout 2> /dev/null)" ] || { echo "Missing script dependency: timeout" >&2; exit 1; } | |
| if [ -n "${NOTIFY_SOCKET:-}" ]; then | |
| [ -x "$(which systemd-notify 2> /dev/null)" ] || { echo "Missing script dependency: systemd-notify" >&2; exit 1; } | |
| fi | |
| # helper functions | |
| print_usage() { | |
| echo "Usage:" | |
| echo " $(basename "$0") [-v|--verbose] [-e|-x|--exclusive] [-s|--shared] \\" | |
| echo " MUTEX_FILE [TIMEOUT]" | |
| } | |
| quote() { | |
| local QUOTED= | |
| for ARG in "$@"; do | |
| [ "$(printf '%q' "$ARG")" == "$ARG" ] \ | |
| && QUOTED+=" $ARG" \ | |
| || QUOTED+=" ${ARG@Q}" | |
| done | |
| echo "${QUOTED:1}" | |
| } | |
| cmd() { | |
| [ -z "$VERBOSE" ] || echo + "$(quote "$@")" >&2 | |
| "$@" | |
| } | |
| # app functions | |
| now() { | |
| awk -v o="${1:-0}" '{ printf "%d\n", ($1 + o) * 1000000 }' /proc/uptime | |
| } | |
| wake_up() { | |
| # kill `flock` if currently trying to acquire the lock | |
| if [ -n "$FLOCK_PID" ]; then | |
| cmd kill "$FLOCK_PID" 2> /dev/null || true | |
| FLOCK_PID= | |
| fi | |
| # kill `sleep` if lock was acquired before | |
| if [ -n "$SLEEP_PID" ]; then | |
| cmd kill "$SLEEP_PID" 2> /dev/null || true | |
| SLEEP_PID= | |
| fi | |
| } | |
| notify() { | |
| local MESSAGE="$1" | |
| shift | |
| [ -z "${NOTIFY_SOCKET:-}" ] || cmd systemd-notify --status="$MESSAGE" "$@" || true | |
| echo "$MESSAGE" | |
| } | |
| lock_identity() { | |
| local TYPE="exclusive" | |
| [ -z "$LOCK_SHARED" ] || TYPE="shared" | |
| [ $# -eq 0 ] || [ "$1" == "${1,,}" ] || TYPE="${TYPE^}" | |
| echo "$TYPE lock ${LOCK_FILE@Q}" | |
| } | |
| # script traps | |
| trap_exit() { | |
| EXIT=$? | |
| [[ " 0 129 130 131 143 " != *" $EXIT "* ]] || EXIT=0 | |
| # wake up for good | |
| wake_up | |
| # cleanup traps | |
| trap - HUP INT QUIT ABRT TERM EXIT | |
| } | |
| trap_reload() { | |
| # update timeout | |
| [ -z "$TIMEOUT" ] || EXIT_AFTER="$(now "$TIMEOUT")" | |
| if [ -n "$LOCKED" ]; then | |
| # notify Systemd that we're cycling the lock | |
| notify "Releasing and re-acquiring $(lock_identity)..." --reloading MONOTONIC_USEC="$(now)" | |
| # temporarily release lock | |
| cmd flock -u 9 | |
| else | |
| # do nothing if not holding the lock, just notify Systemd | |
| notify "Reloading without holding $(lock_identity)..." --reloading MONOTONIC_USEC="$(now)" | |
| fi | |
| # wake up to reschedule | |
| wake_up | |
| } | |
| # read parameters and setup script | |
| LOCK_FILE= | |
| TIMEOUT= | |
| LOCK_SHARED= | |
| VERBOSE= | |
| while [ $# -gt 0 ]; do | |
| if [[ "$1" =~ ^-[a-zA-Z0-9]{2,}$ ]]; then | |
| set -- $(echo "${1:1}" | sed 's/./-& /g') "${@:2}" | |
| continue | |
| fi | |
| if [ "$1" == "--help" ]; then | |
| print_usage | |
| exit 0 | |
| elif [ "$1" == "-e" ] || [ "$1" == "-x" ] || [ "$1" == "--exclusive" ]; then | |
| LOCK_SHARED= | |
| elif [ "$1" == "-s" ] || [ "$1" == "--shared" ]; then | |
| LOCK_SHARED=y | |
| elif [ "$1" == "-v" ] || [ "$1" == "--verbose" ]; then | |
| VERBOSE=y | |
| elif [ -z "$LOCK_FILE" ]; then | |
| LOCK_FILE="$1" | |
| elif [ -z "$TIMEOUT" ]; then | |
| TIMEOUT="$1" | |
| else | |
| echo "Unknown option: $1" >&2 | |
| exit 1 | |
| fi | |
| shift | |
| done | |
| if [ -z "$LOCK_FILE" ]; then | |
| print_usage >&2 | |
| exit 1 | |
| fi | |
| EXIT= | |
| EXIT_AFTER= | |
| if [[ "$TIMEOUT" =~ ^[0-9]+(\.[0-9]+)?$ ]]; then | |
| EXIT_AFTER="$(now "$TIMEOUT")" | |
| elif [ -n "$TIMEOUT" ]; then | |
| echo "Invalid value for 'TIMEOUT': $TIMEOUT" >&2 | |
| exit 1 | |
| fi | |
| FLOCK_PID= | |
| SLEEP_PID= | |
| LOCKED= | |
| trap trap_exit INT QUIT ABRT TERM EXIT | |
| trap trap_reload HUP | |
| # notify Systemd that we're trying to acquire the lock | |
| notify "Acquiring $(lock_identity)..." | |
| # prepare lock by opening a r/o file descriptor | |
| [ -e "$LOCK_FILE" ] || cmd touch "$LOCK_FILE" | |
| [ -z "$VERBOSE" ] || echo + "exec 9< ${LOCK_FILE@Q}" >&2 | |
| exec 9< "$LOCK_FILE" | |
| # keep lock until terminated | |
| while [ -z "$EXIT" ]; do | |
| NOW="$(now)" | |
| LOCKED= | |
| # check whether we can acquire the lock right away | |
| if ! cmd flock ${LOCK_SHARED:+-s} -n 9; then | |
| # wait until the timeout is exceeded, or a maximum of 5min | |
| # if the timeout is exceeded, break the loop and exit, or just reschedule otherwise | |
| WAIT_FOR=299999999 | |
| if [ -n "$EXIT_AFTER" ]; then | |
| (( NOW < EXIT_AFTER )) || { EXIT=254; break; } | |
| (( NOW + WAIT_FOR <= EXIT_AFTER )) || WAIT_FOR="$(( EXIT_AFTER - NOW - 1 ))" | |
| fi | |
| # extend Systemd start timeout (+ 10sec for script to exit gracefully) | |
| [ -z "${NOTIFY_SOCKET:-}" ] || cmd systemd-notify EXTEND_TIMEOUT_USEC="$(( $WAIT_FOR + 10000001 ))" | |
| # acquire lock | |
| # due to limited accuracy `flock` might wait <1sec longer than expected | |
| cmd timeout -k1 "$(( WAIT_FOR / 1000000 + 1 ))" flock ${LOCK_SHARED:+-s} 9 & | |
| FLOCK_PID=$! | |
| wait $FLOCK_PID || true | |
| FLOCK_PID= | |
| # `flock` can quit prematurely for many reasons (e.g. when receiving a SIGHUP), | |
| # so reschedule to check whether the lock was acquired; if it was, it will still be | |
| continue | |
| fi | |
| # notify Systemd that the mutex is ready | |
| notify "$(lock_identity X) acquired" --ready | |
| LOCKED=y | |
| # sleep until terminated by external signal | |
| cmd sleep infinity & | |
| SLEEP_PID=$! | |
| wait $SLEEP_PID || true | |
| SLEEP_PID= | |
| done | |
| # notify Systemd that we're stopping | |
| EXIT_INFO="gracefully" | |
| [ "${EXIT:-0}" -eq 0 ] || EXIT_INFO="non-gracefully" | |
| [ -n "$LOCKED" ] \ | |
| && notify "Releasing $(lock_identity) and stopping $EXIT_INFO..." --stopping \ | |
| || { [ "$EXIT" -eq 254 ] \ | |
| && notify "Acquiring $(lock_identity) timed out, stopping..." --stopping \ | |
| || notify "Stopping $EXIT_INFO without holding $(lock_identity)..." --stopping; } | |
| # close file descriptor (which also releases a possible lock) and exit | |
| [ -z "$VERBOSE" ] || echo + "exec 9<&-" >&2 | |
| exec 9<&- | |
| exit "$EXIT" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Very nice! ❤️