-
-
Save mjambon/79adfc5cf6b11252e78b75df50793f24 to your computer and use it in GitHub Desktop.
#! /usr/bin/env bash | |
# | |
# Run parallel commands and fail if any of them fails. | |
# | |
set -eu | |
pids=() | |
for x in 1 2 3; do | |
ls /not-a-file & | |
pids+=($!) | |
done | |
for pid in "${pids[@]}"; do | |
wait "$pid" | |
done |
#! /usr/bin/env bash | |
# | |
# Run parallel commands and fail if any of them fails. | |
# | |
# The expected output is something like this: | |
# | |
# $ ./parallel-explained | |
# ls: cannot access '/not-a-file': No such file or directory | |
# ls: cannot access '/not-a-file'ls: cannot access '/not-a-file': No such file or directory | |
# : No such file or directory | |
# | |
# Our 'parallel-explained' script exited with code '2', because it's the exit | |
# code of one of the failed 'ls' jobs: | |
# | |
# $ echo $? | |
# 2 | |
# | |
# 'set -e' tells the shell to exit if any of the foreground command fails, | |
# i.e. exits with a non-zero status. | |
set -eu | |
# Initialize array of PIDs for the background jobs that we're about to launch. | |
pids=() | |
for x in 1 2 3; do | |
# Run a command in the background. We expect this command to fail. | |
ls /not-a-file & | |
# Add the PID of this background job to the array. | |
pids+=($!) | |
done | |
# Wait for each specific process to terminate. | |
# Instead of this loop, a single call to 'wait' would wait for all the jobs | |
# to terminate, but it would not give us their exit status. | |
# | |
for pid in "${pids[@]}"; do | |
# | |
# Waiting on a specific PID makes the wait command return with the exit | |
# status of that process. Because of the 'set -e' setting, any exit status | |
# other than zero causes the current shell to terminate with that exit | |
# status as well. | |
# | |
wait "$pid" | |
done |
This did not work for me with the function which I was using, the wait command just hung.
I've added the following in order to fix it, I suggest adding it to the gist:
exit_code=0
for pid in "${pids[@]}"; do
wait "$pid" || exit_code=1
done
if [ "$exit_code" == "1" ]; then
exit 1
fi
@dudicoco I don't understand what the problem was and why it would be fixed by having wait "$pid" || exit_code=1
instead of just wait "$pid"
. The exit status of wait
is the same as the process being waited on. If the process being waited on is stuck in an infinite loop or something, then wait
gets stuck too. Could you provide a full script that exhibits the problem you were describing?
The version of bash could be useful too, in case there's an oddity.
Everyone, note I just changed the hashbang line in the script to #! /usr/bin/env bash
so it works for MacOS users who installed a more recent version of bash with homebrew but still have the old bash 3.x at /bin/bash
.
We have two blocking background processes. The first one works well. The second one fails.
But wait
will infinitely wait only for the first process (which is still working fine), knowing nothing about the second one which has been failed.
set -eu
pids=()
tail -f /var/log/syslog &>/dev/null &
pids+=($!)
tail -f /nonexistent.log &>/dev/null &
pids+=($!)
for pid in "${pids[@]}"; do
wait "$pid"
done
[I deleted an earlier reply which was wrong]
But
wait
will infinitely wait only for the first process (which is still working fine), knowing nothing about the second one which has been failed.
Yes, indeed the solution here has this problem. I tried a few alternatives and they're not obvious. Here's one solution for exiting as early as possible as soon as a child finishes with an error status:
#! /usr/bin/env bash
set -eu
# Declare a numeric variable for counting the children
declare -i n=0
(sleep 3; echo ok3) &
n+=1
(sleep 2; echo fail2; exit 1) &
n+=1
(sleep 1; echo ok1) &
n+=1
while [[ "$n" -gt 0 ]]; do
echo waiting
# Wait for any child to finish, returning its exit status,
# and exiting the script if the status is nonzero (due to 'set -e'),
# leaving some child processes running.
wait -n
n=n-1
done
If we run it, we see that the first job that sleeps 3 seconds keeps running after the parent script terminates. I get this output:
$ ./parallel3
waiting
ok1
waiting
fail2
$ ok3
To fix this, we'd have to kill the remaining children before exiting.
Here's an improved version, which tries to terminate the remaining children before exiting:
#! /usr/bin/env bash
#
# Run parallel commands and fail if any of them fails.
#
set -eu
pids=()
(sleep 3; echo ok3) &
pids+=($!)
(sleep 2; echo fail2; exit 1) &
pids+=($!)
(sleep 1; echo ok1) &
pids+=($!)
for pid in "${pids[@]}"; do
if wait -n; then
:
else
status=$?
echo "One of the subprocesses exited with nonzero status $status. Aborting."
for pid in "${pids[@]}"; do
# Send a termination signal to all the children, and ignore errors
# due to children that no longer exist.
kill "$pid" 2> /dev/null || :
done
exit "$status"
fi
done
It's a little complicated and maybe incorrect in some respects.
Here's my solution:
#!/usr/bin/env bash
set -eu
ARG1=${1:-$(nproc --ignore=1)}
pids=()
for x in $(seq 1 ${ARG1}); do
python3 unit_tests.py &
pids+=($!)
done
for pid in "${pids[@]}"; do
if wait -n; then
:
else
exit_code=$?
echo "Process exited with $exit_code, killing other tests now."
for pid in "${pids[@]}"; do
kill -9 "$pid" 2> /dev/null || :
done
exit "$exit_code"
fi
done
Pairing the
tl;dr
version with a longer line-by-line clear explanation is clutch, thank you!