Slurm is a job scheduler for computer clusters. This document is based on this tutorial. A useful guide showing the relationships between SGE and SLURM is available here.
Two commands may be useful:
- sinfoprovides information about the resouces of the cluster.
- squeueshows which jobs the computer clusters resources are allocated to.
sinfo will show "partitions", which are a set of compute nodes grouped logically. To show all nodes separately:
$ sinfo -NAnd to further show the available CPU's and memory, the -l flag can be added.
$ sinfo -N -lsqueue shows the currently running job. The state of the job can be running "R" or pending "PD". The squeue command can also be used to see the job ID.
User specific jobs can be displayed with the --user flag.
$ squeue --user=myusernameJobs can be canceled by specifying the job ID with scancel <Job ID>. scancel also has a --user flag so all jobs by a user can be canceled.
The scontrol command also seems to be more informative on understanding the current job loads for various nodes:
$ scontrol show nodesJob submissions distinguish between resource requests and job steps. Resource requests are the meta-information about the job like memory, etc. Job steps are what is actually executed.
Job resources can be specified in the header of a submission script. An example script (submit.sh) is below:
#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=res.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
srun hostname
srun sleep 60The job can then be submitted through sbatch:
$ sbatch submit.shWhen the job is submitted, the commands will run from where the job was submitted and include PATH information.
The sstat command can be used to track the resource usage of your jobs.
$ sstat -j <Job ID>For slurm, a "task" is understood to be a "process": so multi-process program is composed of multiple tasks. While a multithreaded program is composed of only a single task, which uses several CPUs.
Tasks are requested with the --ntasks option, while CPUs for multi-threaded programs are done with the --cpus-per-task flag.
An example array job is below:
#!/bin/bash
#
#SBATCH --job-name=test_emb_arr
#SBATCH --output=res_emb_arr.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
#
#SBATCH --array=1-8
ARGS=(0.05 0.25 0.5 1 2 5 100)
srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]}The srun command can be used to obtain an interactive shell on the compute nodes.
$ srun --pty bash