Created
March 11, 2025 21:27
-
-
Save infotroph/4ad8a8411ddbd72b929ad23ac8f5e596 to your computer and use it in GitHub Desktop.
model launcher script to run PEcAn jobs as Slurm arrays
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
launchdir=$(dirname "$1") | |
logfile="$launchdir"/slurm_submit_log.txt | |
if [[ -z ${SLURM_ARRAY_TASK_ID} ]]; then | |
echo "SLURM_ARRAY_TASK_ID not set. Exiting." >> "$logfile" | |
exit 1 | |
fi | |
# joblist.txt has job script name on line 1, invocation dirs on lines 2-n | |
# => add 1 to each task ID to get its line number | |
jobscript=$(head -n1 "$launchdir"/joblist.txt) | |
task_line=$((SLURM_ARRAY_TASK_ID + 1)) | |
taskdir=`tail -n+"$task_line" "$launchdir"/joblist.txt | head -n1` | |
"$taskdir"/"$jobscript" >> "$logfile" 2>&1 | |
if [[ "$?" != "0" ]]; then | |
echo "ERROR IN MODEL RUN" >> "$logfile" | |
exit 1 | |
fi |
Next steps
- When submitting multiple batches (ie when number of jobs in run > settings$host$modellauncher$Njobmax), all batches currently get Njobmax slots even if the last batch isn't full. I don't know how much wasted overhead this causes -- maybe the extras just exit immediately? If needed, we could precalculate how many jobs are needed per batch and adjust array sizes accordingly.
- Feels a little silly (though maybe nice for debugging?) to do all the work of writing out separate joblists rather than just reading lines from rundir/runs.txt, which PEcAn always generates upstream. If making that switch, we should consider whether to make the existing modellauncher work the same way.
- Calling it this way makes
slurm_array_submit.sh
a wrapper around thelauncher.sh
wrapper that PEcAn already writes. Can we build array support into, say,setup_modellauncher()
instead? - Seems wise to have some guardrails to avoid collision between standard qsub and array mode, lest we launch N arrays of N jobs each.
Also: Before getting too far into the weeds on editing this, evaluate whether we can adopt an existing framework - see especially future.batchtools
.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm using this with a
<host>
section that looks like this: