Last active
July 29, 2020 02:00
-
-
Save wflynny/5b6922e375de4ace66b7ffd8a84ee19e to your computer and use it in GitHub Desktop.
Small utility to run top or nvidia-smi on a compute node from the login node
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
TEMP=$(getopt -o hsg --long help,snapshot,gpu -n 'susuage' -- "$@") | |
if [ $? != 0 ] ; then echo "Terminating..." >&2 ; exit 1 ; fi | |
# Note the quotes around `$TEMP': they are essential! | |
eval set -- "$TEMP" | |
SNAPSHOT=false | |
GPU=false | |
while true; do | |
case "$1" in | |
-h | --help ) echo "susage [-h/--help] [-s/--snapshot] [-g/--gpu] JOBID"; exit 0 ;; | |
-s | --snapshot ) SNAPSHOT=true; shift ;; | |
-g | --gpu ) GPU=true; shift ;; | |
-- ) shift; break ;; | |
* ) break ;; | |
esac | |
done | |
JOBID=$1 | |
if [[ -z $JOBID ]]; then | |
echo "Please supply SLURM jobid." | |
exit 2 | |
fi | |
# Check that this job is not PENDING/HELD | |
[[ $(scontrol show job ${JOBID} | egrep -o "JobState=RUNNING") =~ RUNNING$ ]] || { echo "Job ${JOBID} is not currently running. Exiting."; exit 1; } | |
# Unfortunately nvidia-smi and top have different defaults, | |
# with top automatically polling and nvidia requiring an option to poll | |
top_options="-u ${USER}" | |
smi_options="-l 3" | |
if [[ "$SNAPSHOT" == true ]]; | |
then | |
top_options="-n 1 -u ${USER}" | |
smi_options="" | |
fi | |
cmd="top ${top_options}" | |
srun_options="-p compute -q batch" | |
if [[ "$GPU" == true ]]; | |
then | |
cmd="{ module load cuda10.0/toolkit; nvidia-smi ${smi_options} }" | |
srun_options="-q dev" | |
fi | |
nodelist="--$(scontrol show job ${JOBID} | egrep -o 'NodeList=([a-z0-9]+)' | tr [:upper:] [:lower:])" | |
srun ${nodelist} ${srun_options} --pty ${cmd} |
Note that the nvidia-smi
functionality doesn't currently work. srun
requires an executable, so the cmd
should probably be a function, but can't get it to work 100% correctly.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Changed the offending portion to look like
{ cmd1; cmd2; }
so that theexit
happens in the contexts of the current shell (rather than a subshell with(...)
).