At the start, I have:
- CentOS
centos-release-7-7.1908.0.el7.centos.x86_64
. - Slurm
- Installed to
/opt/slurm
. - Only
slurmctld
andslurmd
are running.
- Installed to
- Munge.
Now, I am going to configure it to bring the slurmdbd accounting tool with MariaDB.
Install MariaDB via yum.
yum install mariadb-server mariadb-devel
To keep it stable, override the innodb_log_file_size
configs.
echo -e "[mysqld]\ninnodb_log_file_size=48M" | tee /etc/my.cnf.d/slurm.cnf
Start the MariaDB server.
systemctl start mariadb
systemctl enable mariadb
If it works fine, then you can access the MariaDB command line. Let's check if innodb_log_file_size
value fits to what we set in the slurm.cnf
.
$ sudo mysql -e "SHOW VARIABLES LIKE 'innodb_log_file_size';"
+----------------------+----------+
| Variable_name | Value |
+----------------------+----------+
| innodb_log_file_size | 50331648 |
+----------------------+----------+
Looks good.
Next, we need to create the database for accounting data and grant full access to slurm
user.
mysql -e "grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'jfh983hjf38hf48f829jhJHG##' with grant option;"
mysql -e "create database slurm_acct_db;"
First, create a slurmdbd.conf
file and fill it with the corresponding content.
cd /opt/slurm/etc/
touch slurmdbd.conf
chmod 600 slurmdbd.conf
vi slurmdbd.conf # Fill it with the content from attached file `slurmdbd.conf`
After that, create an empty log file.
touch /var/log/slurmdbd.log
chmod 600 /var/log/slurmdbd.log
Next, we need to create a systemd service file for slurmdbd.
vi /etc/systemd/system/slurmdbd.service # Fill it with the content from attached file `slurmdbd.conf`
Reload systemd daemon configs.
systemctl daemon-reload
Now, systemd can see our slurmdbd daemon. Let's start it.
systemctl start slurmdbd.service
systemctl enable slurmdbd.service
To enable accounting feature, update the /opt/slurm/etc/slurm.conf
file. You need to uncomment and override the following keys.
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=localhost
AccountingStoragePass=/var/run/munge/munge.socket.2
Restart the slurmctld and reload slurm daemons on compute nodes.
systemctl restart slurmctld
scontrol reconfigure
Finally, create additional accounting tables in the database. To do that, execute the following command.
sacctmgr -i add cluster <clsuter_name>
Where <cluster_name>
is the name of your cluster defined in the slurm.conf
. There is a shortcut to get it fast.
$ scontrol show config | grep ClusterName
ClusterName = myclsuter
Done!
$ sbatch --wrap='sleep 10'
Submitted batch job 111
$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
111 wrap compute 1 RUNNING 0:0
See the official documentation for details.