Last active
January 23, 2024 09:48
-
-
Save programminghoch10/7b240002e3ac645fdb01478619e7bf5c to your computer and use it in GitHub Desktop.
Simple bash script parallelization using semaphores
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
SEMPATH="/run/lock" | |
SEMNAME="" | |
semtake() { | |
local name="$1" | |
[ -z "$name" ] && echo "Missing semaphore name!" && return 1 | |
local j="$2" | |
[ -z "$2" ] && j=$(nproc) | |
[ -n "$SEMNAME" ] && echo "Already have $SEMNAME" && return 1 | |
while true; do | |
for i in $(seq 1 $j); do | |
SEMNAME=".semlock-$name-$j-$i" | |
mkdir "$SEMPATH/$SEMNAME" 2>/dev/null && break 2 | |
done | |
sleep 1 | |
done | |
trap semgive EXIT | |
} | |
semgive() { | |
[ -z "$SEMNAME" ] && return | |
rmdir "$SEMPATH/$SEMNAME" &>/dev/null || true | |
SEMNAME="" | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
[ -z "$(command -v inotifywait)" ] && echo "inotify-tools need to be installed for $0 to work!" >&2 && return 1 | |
SEMPATH="/run/lock" | |
[ ! -d "$SEMPATH" ] && echo "$SEMPATH is not a valid directory" >&2 && return 1 | |
! (return 0 2>/dev/null) && echo "$0 can only be sourced, not executed" >&2 && exit 1 | |
#SEMNAME="" | |
#SEMNAMEID="" | |
semtake_pool() { | |
local SEMNAME="$1" | |
local j="$2" | |
for i in $(seq 1 "$j"); do | |
SEMNAMEID="$i" | |
mkdir "$SEMPATH/$SEMNAME-$SEMNAMEID" 2>/dev/null || continue | |
return 0 | |
done | |
unset SEMNAMEID | |
return 1 | |
} | |
semtake() { | |
local name="$1" | |
[ -z "$name" ] && echo "Missing semaphore name!" >&2 && return 1 | |
local j="$2" | |
[ -z "$2" ] && j=$(nproc) | |
[ -n "$SEMNAMEID" ] && echo "Already have $SEMNAME" >&2 && return 1 | |
SEMNAME=".semlock-$name" | |
until semtake_pool "$SEMNAME" "$j"; do | |
local i | |
i="$(find "$SEMPATH" -maxdepth 1 -type d -name "$SEMNAME-wait-*" 2>/dev/null | sed 's/^.*-\([[:digit:]]*\)$/\1/' | sort -n | tail -1)" | |
[ -z "$i" ] && i=0 | |
local SEMWAITNAME | |
while true; do | |
SEMWAITNAME="$SEMNAME"-wait-$i | |
i=$((i+1)) | |
mkdir "$SEMPATH"/"$SEMWAITNAME" &>/dev/null || continue | |
break | |
done | |
inotifywait --quiet --quiet --event delete_self "$SEMPATH"/"$SEMWAITNAME" | |
rmdir "$SEMPATH"/"$SEMWAITNAME" &>/dev/null || true | |
done | |
trap semgive EXIT | |
} | |
semgive() { | |
[ -z "$SEMNAME" ] && return | |
[ -z "$SEMNAMEID" ] && return | |
rmdir "$SEMPATH"/"$SEMNAME"-"$SEMNAMEID" &>/dev/null || true | |
unset SEMNAMEID | |
local i | |
i="$(find "$SEMPATH" -maxdepth 1 -type d -name "$SEMNAME-wait-*" 2>/dev/null | sed 's/^.*-\([[:digit:]]*\)$/\1/' | sort -n | head -1)" | |
[ -z "$i" ] && return | |
local SEMWAITNAME | |
local waiter | |
for waiter in "$SEMPATH"/"$SEMNAME"-wait-*; do | |
SEMWAITNAME="$SEMNAME"-wait-$i | |
i=$((i+1)) | |
rmdir "$SEMPATH"/"$SEMWAITNAME" &>/dev/null || continue | |
break | |
done | |
unset SEMNAME | |
} |
semnotify.sh
Another implementation of semlock.sh
.
It features the exact same usage as semlock.sh
, so the instructions and documentation from semlock.sh
apply.
This variant uses inotify-tools
to notify the next waiting process that the semaphore is available.
This way we achive two additional points:
- No busy waiting required, as the processes are passively waiting on filesystem changes.
- Ordered execution, because the waiting line is now numbered and semaphores will be distributed "first come, first serve"
This can be used as a drop-in replacement to semlock.sh
.
If you have inotify-tools
installed, simply download semnotify.sh
, rename it to semlock.sh
and replace the other implementation.
Reserved
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
semlock.sh
This bash script contains two functions making parallelization of bash scripts very easy.
Motivation
Many Semaphore implementations for bash (such as
parallel --sem
) force the user to define the executed task as arguments,because the executed task has to be wrapped by taking the semaphore before the task and giving the semaphore back after the task.
This has the major drawback that an existing script has to be rewritten completely to fit to the semaphore interface.
I had a lot of scripts though where I have a for loop iterating over multiple files, where each iteration could be done in parallel, but multiple commands had to be executed for each file.
So I created my own implementation of semaphores which can be wrapped around an entire code block within a bash script.
Interface Specification
The script defines two methods:
semtake <name> [count]
takes a semaphore with the namename
and allows up tocount
processes with this semaphore at the same time. Settingcount
to 1 will only allow 1 process with that semaphore at the same time. Defaultcount
is the amount of available processor threads.semgive
returns the previously taken semaphoresemtake
may only be called once per shell,semgive
may only be called aftersemtake
has been called within the same shell earlier.semtake
will set up a trap to give back the semaphores when the shell exits for you, so you don't have to callsemgive
explicitly.Migration
Let's assume you have a shell script with a
for
loop similar to this:which could be parallelized but your computer does not have the resources to process every file simulaneously, but does have multiple threads which could be used.
With
semlock.sh
only minimal refactoring is required for parallelization.semlock.sh
functions withsource semlock.sh
(
and) &
semtake
right after(
fileprocess
is this semaphores name, and the semaphore limits execution to2
threads.semtake
for an availlable semaphore.wait
to the end of the loop to wait for all threads to finish.