Last active
September 19, 2021 05:42
-
-
Save bryan-lunt/3389deca4be96ed1301fb5a1ecdf971a to your computer and use it in GitHub Desktop.
Example for submitting multiple steps within a SLURM batch file, those steps should execute in parallel if possible.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
#SBATCH --time=04:00:00 | |
#SBATCH --nodes=1 | |
#SBATCH --ntasks-per-node=12 | |
#SBATCH --exclusive | |
#SBATCH --job-name subjob-sched-test | |
#SBATCH -p secondary | |
module load openmpi/4.1.0-gcc-7.2.0-pmi2 gcc cmake | |
## If not started with SLURM, figure out where we are relative to the build directory | |
#####Snippet from: http://stackoverflow.com/questions/59895/ | |
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" | |
#####end snippet | |
#IF SLURM_SUBMIT_DIR is not set, we are not running in PBS, choose directory relative to script. | |
SLURM_SUBMIT_DIR=${SLURM_SUBMIT_DIR:-${SCRIPT_DIR}} | |
#moves to the directory the user was in when they ran `sbatch` | |
cd ${SLURM_SUBMIT_DIR} #assumed to be the source tree | |
#setup the test script. The lets the whole thing be one script | |
cat << EOF > ./test.bash | |
#!/bin/bash | |
echo -n \${SLURM_STEP_ID} \${SLURM_CPUS_PER_TASK} \${SLURM_STEP_NODELIST} \$(date "+%s") " " | |
sleep 60 | |
echo \$(date "+%s") | |
EOF | |
chmod 755 ./test.bash | |
#submit 12 single-core, 6 dual-core, 4 4-core, 2 6-core and 1 12-core subjobs, all submissions at once, then let SLURM schedule them within this allocation | |
for SIZE in 1 2 4 6 12 | |
do | |
for REP in $(seq $((12/${SIZE})) ) | |
do | |
srun --mpi=pmi2 --exclusive --ntasks-per-core 1 --mem=10M --ntasks 1 --cpus-per-task ${SIZE} ./test.bash > ./bar.s${SIZE}.r${REP} & | |
done | |
#Uncomment this wait and all the replicates of a particular size will get run together. This might improve scheduling. | |
#in other problems, you might not want that, but here we know what the execution times will be and the number of jobs packs the nodes. | |
#wait | |
done | |
wait | |
echo "Doneish?" |
sadly, it seems that setting core affinity --cpu-bind=cores
breaks it again. I guess that's only useful for MPI under srun
then.
Obviously you will need to request more memory.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
View results with
cat bar.s* | sort -g -k 4 -k 1
The columns will be stepID, CPUs, nodeName, startSec, endSec
My results look like
You can see from this that, though I asked for 12 cores, we were given 16 cores and it will use all of them.