Here I will describe a simple configuration of the slurm management tool for launching jobs in a really simplistic cluster. I will assume the following configuration: a main node (for me it is an Arch Linux distribution) and 3 compute nodes (for me compute nodes are Debian VMs). I also assume there is ping access between the nodes and some sort of mechanism for you to know the IP of each node at all times (most basic should be a local NAT with static IPs)
Slurm management tool work on a set of nodes, one of which is considered the master node, and has the slurmctld
daemon running; all other compute nodes have the slurmd
daemon. All communications are authenticated via the munge
service and all nodes need to share the same authentication key. Slurm by default holds a journal of activities in a directory configured in the slurm.conf
file, however a Database management system can be set. All in all what we will try to do is:
- Install `munge