This guide will help you setup a SPOT-based multi-instance type cluster. In order to accomplish this, we make a few assumptions:
a. Instances share the same number of vcpus, in this case it's all 96 vcpus:
c5.24xlarge
r5.24xlarge
m5.24xlarge
c5.metal
r5.metal
m5.metal
b. All instances are launched in a single AZ
- Create a cluster, I used the following config:
[global]
cluster_template = multi-instance
update_check = true
sanity_check = true
[aws]
aws_region_name = us-east-1
[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}
[cluster multi-instance]
base_os = alinux2
key_name = amzn2
vpc_settings = us-east-1
scheduler = slurm
master_instance_type = c5.2xlarge
compute_instance_type = c5.24xlarge
initial_queue_size = 0
max_queue_size = 200
maintain_initial_size = true
disable_hyperthreading = true
fsx_settings = fsx
cluster_type = spot
[fsx fsx]
shared_dir = /fsx
storage_capacity = 1200
[vpc us-east-1]
vpc_id = vpc-b5d7e3cc
master_subnet_id = subnet-5eda8e04
- Go to ASG Console >
parallelcluster-[cluster_name]
> Details > Note the name of the Launch Template, i.e.ComputeServerLaunchTemplate_ncE8QrsA7HW9
- Go to Launch Templates Console > find the template you noted the name of above and click
Modify template (Create new version)
- On the Edit screen, Click on the
Instance type
drop down and select:Don't include in launch tempate
:
Under Advanced details
uncheck Request Spot instances
:
Save the new version.
- Go back to ASG Console >
parallelcluster-[cluster_name]
> Edit > SelectLaunch Template Version
> Select2
Select fleet Composition Combine purchase options and instances
. Select the Instance Types (all with the same number of VCPU's) that you want to use.
Click Save.
Go back to ASG Console > parallelcluster-[cluster_name]
> Edit > Set Desired to 200 instances:
Grab a cup of ☕️ as it scales
🚀
Thanks. I don't currently see a way to allow the cluster to limit the total number of nodes balanced into the system, only on an individual server type