Skip to content

Instantly share code, notes, and snippets.

@bryan-lunt
Last active September 19, 2021 05:42
Show Gist options
  • Select an option

  • Save bryan-lunt/3389deca4be96ed1301fb5a1ecdf971a to your computer and use it in GitHub Desktop.

Select an option

Save bryan-lunt/3389deca4be96ed1301fb5a1ecdf971a to your computer and use it in GitHub Desktop.
Example for submitting multiple steps within a SLURM batch file, those steps should execute in parallel if possible.
#!/bin/bash
#SBATCH --time=04:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --exclusive
#SBATCH --job-name subjob-sched-test
#SBATCH -p secondary
module load openmpi/4.1.0-gcc-7.2.0-pmi2 gcc cmake
## If not started with SLURM, figure out where we are relative to the build directory
#####Snippet from: http://stackoverflow.com/questions/59895/
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
#####end snippet
#IF SLURM_SUBMIT_DIR is not set, we are not running in PBS, choose directory relative to script.
SLURM_SUBMIT_DIR=${SLURM_SUBMIT_DIR:-${SCRIPT_DIR}}
#moves to the directory the user was in when they ran `sbatch`
cd ${SLURM_SUBMIT_DIR} #assumed to be the source tree
#setup the test script. The lets the whole thing be one script
cat << EOF > ./test.bash
#!/bin/bash
echo -n \${SLURM_STEP_ID} \${SLURM_CPUS_PER_TASK} \${SLURM_STEP_NODELIST} \$(date "+%s") " "
sleep 60
echo \$(date "+%s")
EOF
chmod 755 ./test.bash
#submit 12 single-core, 6 dual-core, 4 4-core, 2 6-core and 1 12-core subjobs, all submissions at once, then let SLURM schedule them within this allocation
for SIZE in 1 2 4 6 12
do
for REP in $(seq $((12/${SIZE})) )
do
srun --mpi=pmi2 --exclusive --ntasks-per-core 1 --mem=10M --ntasks 1 --cpus-per-task ${SIZE} ./test.bash > ./bar.s${SIZE}.r${REP} &
done
#Uncomment this wait and all the replicates of a particular size will get run together. This might improve scheduling.
#in other problems, you might not want that, but here we know what the execution times will be and the number of jobs packs the nodes.
#wait
done
wait
echo "Doneish?"
@bryan-lunt
Copy link
Copy Markdown
Author

bryan-lunt commented Sep 17, 2021

View results with cat bar.s* | sort -g -k 4 -k 1

The columns will be stepID, CPUs, nodeName, startSec, endSec

My results look like

0 2 node035 1631907908  1631907968
1 6 node035 1631907908  1631907968
2 1 node035 1631907908  1631907968
3 6 node035 1631907908  1631907968
4 1 node035 1631907908  1631907968
5 4 node035 1631907968  1631908028
6 1 node035 1631907968  1631908028
7 1 node035 1631907968  1631908028
8 1 node035 1631907968  1631908028
9 4 node035 1631907968  1631908028
10 1 node035 1631907968  1631908028
11 1 node035 1631907969  1631908029
12 1 node035 1631907969  1631908029
13 1 node035 1631907969  1631908029
14 1 node035 1631907969  1631908029
15 1 node035 1631908029  1631908089
16 1 node035 1631908029  1631908089
17 2 node035 1631908029  1631908089

You can see from this that, though I asked for 12 cores, we were given 16 cores and it will use all of them.

@bryan-lunt
Copy link
Copy Markdown
Author

sadly, it seems that setting core affinity --cpu-bind=cores breaks it again. I guess that's only useful for MPI under srun then.

@bryan-lunt
Copy link
Copy Markdown
Author

Obviously you will need to request more memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment