Installing and running Slurm on greyworm and other clusters

This page will specifically detail how to install and run Slurm on the greyworm cluster at LLNL. These instructions can be adapted to work on other clusters with other nodes.

In the current configuration, aims4.llnl.gov is the head, while greyworm[1-8].llnl.gov are the compute nodes. Note: The steps here are for the current configuration of aims4 and greyworm[1-8]. The configuration might change in the future.

A Brief Intro to Slurm

Slurm is a job scheduler used to run jobs on supercomputers and clusters.

Clusters using Slurm have a head node, which end-users interact with to submit jobs. The user sets various resources needed for the job and Slurm then runs these jobs on the compute nodes.

The slurmctld daemon is ran on the head node and the slurmd daemon is ran on the compute nodes. These daemons read in configuration parameters from slurm.conf, which allows the head node and compute nodes to communicate.

Munge is used for authentication across the cluster. It creates a munge.key file which each node needs to have to allow for communication.

1. Setting up `aims4` as the head

On aims4, we already have Slurm installed in /usr/local/. If you need to install Slurm, go to the Installing Slurm section somewhere below.

Configuring the head node

Slurm gets its configuration from slurm.conf, an ASCII file with key-value pairs. On aims4, it's located in /usr/local/etc/slurm.conf. Otherwise by default, it's in /etc/slurm/slurm.conf. Look in the Configuration Files section for the full slurm.conf file used on aims4. If you don't have one, generate one from here. Do note that this link is for the latest Slurm version.

Below are some parameters to check before Slurm is ran.

ControlMachine: name of the head node, it's aims4 in our case.
ControlAddr: address of the head node, it's 10.10.10.1 in our case.
SlurmUser: the user on the head node that starts the Slurm controller daemon, slurmctld. It's root in our case and by default.
SlurmdUser: the user on all of the compute nodes that starts the Slurm daemon, slurmd. It's root in our case and by default.

In addition, check that all of the information under the Compute Node section accurately describes the compute nodes. This includes: NodeName, CPUs, RealMemory, Sockets, Nodes, and more.

For greyworm[1-8], the configuration of the compute nodes is:

NodeName=greyworm[1-8] CPUs=8 RealMemory=128 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
PartitionName=batch Nodes=greyworm[1-8] Default=YES MaxTime=INFINITE State=UP

For the acme1 machine, which isn't a cluster, the configuration is:

NodeName=acme1 CPUs=192 Sockets=4 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=1000000 State=UNKNOWN
PartitionName=debug OverSubscribe=YES Nodes=acme1 Default=YES MaxTime=INFINITE State=UP

2. Setting up each `greyworm` node

First, go to your head node (aims4) and see what version of Slurm you have. The head and compute nodes need to have the same version of Slurm. You can check this with the following command:

# slurmd -V

If you don't have the correct version of Slurm, proceed to the Installing Slurm section.

Installing Slurm

Note: Installing via a package doesn't allow you to choose the installation location (i.e., --prefix isn't going to work, go try it) and installs it in /usr/. Build and install from source if you want to be able to choose.

First, make sure you have munge installed (try running munged -V to see if it's installed). If not, Google how to install it and generate the munge key. Make sure you have the same munge.key across all machines!
Download the correct version of Slurm that's on your head node from here.
Create the rpm's from the tarball: rpmbuild -ta slurm*.tar.bz2
First install Slurm with the plugins: rpm --install slurm-*.el6.x86_64.rpm slurm-plugins-*.el6.x86_64.rpm
Then install all of the other packages created from step 2. rpm --install <the other rpm files>

Setting up Slurm on the compute nodes

Using scp, copy over the slurm.conf from the head node to all of the compute nodes. If you followed the installation instructions from the previous section, it should be placed in /etc/slurm/slurm.conf

3. Starting and testing the cluster

Now we'll actually start the cluster. First we'll start the head node, then each of the compute nodes.

Note: Currently, Slurm was installed from source on the compute nodes, greyworm[1-8]. We're a little too lazy to uninstall it, but we might do that soon. Since we have two versions of Slurm on the same machine, we need to use the new installation (located in /usr/) and not other the old one (located in /usr/local and is added to $PATH). So we give the full path to the binary in the steps below when working on the compute nodes.

Starting the head node

Log in as root
```
$ sudo su -
```
Start the munge daemon
```
# service munge start
```
Start the Slurm controller daemon
```
# slurmcltd
```

If this is your first time doing this, run it in the foreground (-D) with a bunch of debug info (-vvvvv)
```
# slurmcltd -D -vvvvv
```

Starting the compute nodes

Remember we give the full path due to the reasons in the aforementioned note.

Log in as root
```
$ sudo su -
```
Start the munge daemon
```
# service munge start
```
Start the Slurm daemon
```
# /usr/sbin/slurmd
```

Similarly, If this is your first time doing this, run it in the foreground (-D) with a bunch of debug info (-vvvvv)
```
# /usr/sbin/slurmd -D -vvvvv
```

Testing that everything works

You can use any of the following two commands to make sure that your cluster is up and running.

From one of the compute nodes, ping the Slurm controller:
```
# /usr/bin/scontrol ping
```
It should look like:
```
[root@greyworm1 ~]# /usr/bin/scontrol ping
Slurmctld(primary/backup) at aims4/(NULL) are UP/DOWN
```
Note: Slurm has the ability to have a backup head node, in case the primary one fails. We don't have this setup, which is why you see NULL and DOWN in the output above.

From one of the compute nodes, view information about the cluster.

# /usr/bin/sinfo

It should look like:

[root@greyworm1 ~]# /usr/bin/sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
batch*       up   infinite      1   idle greyworm1

Troubleshooting

Stuff to check if stuff doesn't work.

DO YOU HAVE THE EXACT SAME slurm.conf IN THE HEAD AND COMPUTE NODES? DON'T COPY AND PASTE IT, ACTUALLY USE scp TO MOVE IT.
Remember you need to run everything as root. This is because we were too lazy to create a Slurm user on our machines. We should actually do this.
From another node in the cluster, try to ping the head node, ex: ping aims4.llnl.gov. If this doesn't work, contact your sysadmin, i.e. Tony. Also check that SlurmctldPort and SlurmdPort port from slurm.conf are not blocked.
Still stuck? Configure your cluster to have just one head and compute node. Under the # COMPUTE NODES section in slurm.conf, change NodeName=greyworm[1-8] and Nodes=greyworm[1-8] to be NodeName=greyworm[1] and Nodes=greyworm[1] respectively. Remember to do this for both slurm.conf files in aims4.llnl.gov and greyworm1.llnl.gov.

Configuration Files

Below are the configuration files for setting up the greyworm cluster and the acme1 machine.

slurm.conf for greyworm

# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=aims4
ControlAddr=10.10.10.1
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6717
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6718
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
SlurmdUser=root
StateSaveLocation=/var/spool
SwitchType=switch/none
TaskPlugin=task/none
JobCredentialPrivateKey=/usr/local/etc/slurm.key
JobCredentialPublicCertificate=/usr/local/etc/slurm.cert

#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/builtin
#SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/filetxt
ClusterName=greyworm
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurmctl.log
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm.log
#
#
# COMPUTE NODES
NodeName=greyworm[1-8] CPUs=8 RealMemory=128 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
PartitionName=batch Nodes=greyworm[1-8] Default=YES MaxTime=INFINITE State=UP

slurm.conf for acme1

# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=acme1
ControlAddr=acme1
AuthType=auth/munge
# 
#MailProg=/bin/mail 
MpiDefault=none
#MpiParams=ports=#-# 
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
#SlurmctldPort=6817 
SlurmdPidFile=/var/run/slurmd.pid
#SlurmdPort=6818 
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
#SlurmdUser=root 
StateSaveLocation=/var/spool
SwitchType=switch/none
TaskPlugin=task/none

JobCredentialPrivateKey=/etc/slurm/slurm.key
JobCredentialPublicCertificate=/etc/slurm/slurm.cert
SlurmctldPort=6817
SlurmdPort=6818


SlurmctldLogFile=/var/log/slurmctl.log
SlurmdLogFile=/var/log/slurm.log
# 
# 
# TIMERS 
#KillWait=30 
#MinJobAge=300 
#SlurmctldTimeout=120 
SlurmdTimeout=1000
# 
# 
# SCHEDULING 
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/linear
# 
# 
# LOGGING AND ACCOUNTING 
AccountingStorageType=accounting_storage/none
ClusterName=ACME
#JobAcctGatherFrequency=30 
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=3 
#SlurmctldLogFile=
#SlurmdDebug=3 
#SlurmdLogFile=
# 
# 
# COMPUTE NODES 
NodeName=acme1 CPUs=192 Sockets=4 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=1000000 State=UNKNOWN
PartitionName=debug OverSubscribe=YES Nodes=acme1 Default=YES MaxTime=INFINITE State=UP

zshaheen/installing-and-running-slurm.md