Use apt to install the necessary packages:
sudo apt install -y slurm-wlm slurm-wlm-doc
Load file:///usr/share/doc/slurm-wlm/html/configurator.html in a browser (or file://wsl%24/Ubuntu/usr/share/doc/slurm-wlm/html/configurator.html on WSL2), and:
- Set your machine's hostname in
SlurmctldHost
andNodeName
. - Set
CPUs
as appropriate, and optionallySockets
,CoresPerSocket
, andThreadsPerCore
. Use commandlscpu
to find what you have. - Set
RealMemory
to the number of megabytes you want to allocate to Slurm jobs, - Set
StateSaveLocation
to/var/spool/slurm-llnl
. - Set
ProctrackType
tolinuxproc
because processes are less likely to escape Slurm control on a single machine config. - Make sure
SelectType
is set toCons_res
, and setSelectTypeParameters
toCR_Core_Memory
. - Set
JobAcctGatherType
toLinux
to gather resource use per job, and setAccountingStorageType
toFileTxt
.
Hit Submit
, and save the resulting text into /etc/slurm-llnl/slurm.conf
i.e. the configuration file referred to in /lib/systemd/system/slurmctld.service
and /lib/systemd/system/slurmd.service
.
Load /etc/slurm-llnl/slurm.conf
in a text editor, uncomment DefMemPerCPU
, and set it to 8192
or whatever number of megabytes you want each job to request if not explicitly requested using --mem
during job submission. Read the docs and edit other defaults as you see fit.
Create /var/spool/slurm-llnl
and /var/log/slurm_jobacct.log
, then set ownership appropriately:
sudo mkdir -p /var/spool/slurm-llnl
sudo touch /var/log/slurm_jobacct.log
sudo chown slurm:slurm /var/spool/slurm-llnl /var/log/slurm_jobacct.log
Install mailutils
so that Slurm won't complain about /bin/mail
missing:
sudo apt install -y mailutils
Make sure munge is installed and running, and a munge.key
was created with user-only read-only permissions, owned by munge:munge
:
sudo service munge start
sudo ls -l /etc/munge/munge.key
Start services slurmctld
and slurmd
:
sudo service slurmd start
sudo service slurmctld start
Hi maybe you can help me! I try to install slurm on our server (which we installed ubuntu 20.04), all the steps in your instructions worked fine, except when I execute the command 'sudo service slurmd start', the error messages is:
"Job for slurmd.service failed because the control process exited with error code.
See "systemctl status slurmd.service" and "journalctl -xe" for details."
The message got from "systemctl status slurmd.service" is:
"slurmd.service - Slurm node daemon
Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2021-06-18 17:27:38 CEST; 4min 6s ago
Docs: man:slurmd(8)
Process: 3645924 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=1/FAILURE)"
The message got from "journalctl -xe" is:
"pam_unix(sudo:auth): Couldn't open /etc/securetty: No such file or directory"
After using "sudo cp /usr/share/doc/util-linux/examples/securetty /etc/securetty" to copy a securetty file into the specified directory, the slurmd service still can't start.
Please help and thank you so much!
Best,
Lihua