Skip to content

Instantly share code, notes, and snippets.

@hackprime
Last active August 14, 2024 14:46
Show Gist options
  • Save hackprime/486759fa98bf0112aed8302a036526e3 to your computer and use it in GitHub Desktop.
Save hackprime/486759fa98bf0112aed8302a036526e3 to your computer and use it in GitHub Desktop.

Add Slurm Accounting (slurmdbd) to Existing Slurm Configuration

At the start, I have:

  • CentOS centos-release-7-7.1908.0.el7.centos.x86_64.
  • Slurm
    • Installed to /opt/slurm.
    • Only slurmctld and slurmd are running.
  • Munge.

Now, I am going to configure it to bring the slurmdbd accounting tool with MariaDB.

Configure MariaDB

Install MariaDB via yum.

yum install mariadb-server mariadb-devel

To keep it stable, override the innodb_log_file_size configs.

echo -e "[mysqld]\ninnodb_log_file_size=48M" | tee /etc/my.cnf.d/slurm.cnf

Start the MariaDB server.

systemctl start mariadb
systemctl enable mariadb

If it works fine, then you can access the MariaDB command line. Let's check if innodb_log_file_size value fits to what we set in the slurm.cnf.

$ sudo mysql -e "SHOW VARIABLES LIKE 'innodb_log_file_size';"
+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| innodb_log_file_size | 50331648 |
+----------------------+----------+

Looks good.

Next, we need to create the database for accounting data and grant full access to slurm user.

mysql -e "grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'jfh983hjf38hf48f829jhJHG##' with grant option;"
mysql -e "create database slurm_acct_db;"

Configure slurmdbd daemon

First, create a slurmdbd.conf file and fill it with the corresponding content.

cd /opt/slurm/etc/
touch slurmdbd.conf
chmod 600 slurmdbd.conf
vi slurmdbd.conf  # Fill it with the content from attached file `slurmdbd.conf`

After that, create an empty log file.

touch /var/log/slurmdbd.log
chmod 600 /var/log/slurmdbd.log

Next, we need to create a systemd service file for slurmdbd.

vi /etc/systemd/system/slurmdbd.service  # Fill it with the content from attached file `slurmdbd.conf`

Reload systemd daemon configs.

systemctl daemon-reload

Now, systemd can see our slurmdbd daemon. Let's start it.

systemctl start slurmdbd.service
systemctl enable slurmdbd.service

Enable accounting

To enable accounting feature, update the /opt/slurm/etc/slurm.conf file. You need to uncomment and override the following keys.

AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=localhost
AccountingStoragePass=/var/run/munge/munge.socket.2

Restart the slurmctld and reload slurm daemons on compute nodes.

systemctl restart slurmctld
scontrol reconfigure

Finally, create additional accounting tables in the database. To do that, execute the following command.

sacctmgr -i add cluster <clsuter_name>

Where <cluster_name> is the name of your cluster defined in the slurm.conf. There is a shortcut to get it fast.

$ scontrol show config | grep ClusterName
ClusterName             = myclsuter

Done!

Test

$ sbatch --wrap='sleep 10'
Submitted batch job 111
$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
111                wrap    compute                     1    RUNNING      0:0

See the official documentation for details.

#
# Example slurmdbd.conf file.
#
# See the slurmdbd.conf man page for more information.
#
# Archive info
#ArchiveJobs=yes
#ArchiveDir="/tmp"
#ArchiveSteps=yes
#ArchiveScript=
#JobPurge=12
#StepPurge=1
#
# Authentication info
AuthType=auth/munge
#AuthInfo=/var/run/munge/munge.socket.2
#
# slurmDBD info
DbdAddr=localhost
DbdHost=localhost
#DbdPort=7031
SlurmUser=slurm
#MessageTimeout=300
DebugLevel=4
#DefaultQOS=normal,standby
LogFile=/var/log/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
#PluginDir=/usr/lib/slurm
#PrivateData=accounts,users,usage,jobs
#TrackWCKey=yes
#
# Database info
StorageType=accounting_storage/mysql
#StorageHost=localhost
#StoragePort=1234
StoragePass=jfh983hjf38hf48f829jhJHG##
StorageUser=slurm
StorageLoc=slurm_acct_db
[Unit]
Description=Slurm DBD accounting daemon
After=network.target munge.service
ConditionPathExists=/opt/slurm/etc/slurmdbd.conf
[Service]
Type=forking
EnvironmentFile=-/etc/sysconfig/slurmdbd
ExecStart=/opt/slurm/sbin/slurmdbd $SLURMDBD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurmdbd.pid
[Install]
WantedBy=multi-user.target
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment