Skip to content

Instantly share code, notes, and snippets.

@sean-smith
Last active August 24, 2021 23:10
Show Gist options
  • Save sean-smith/c7ad5b0a01c5d5d56d621c0e9f48911c to your computer and use it in GitHub Desktop.
Save sean-smith/c7ad5b0a01c5d5d56d621c0e9f48911c to your computer and use it in GitHub Desktop.
AWS ParallelCluster Multi Instance Type

AWS ParallelCluster Multi-Instance Type

AWS ParallelCluster Architecture Diagram

This guide will help you setup a SPOT-based multi-instance type cluster. In order to accomplish this, we make a few assumptions:

a. Instances share the same number of vcpus, in this case it's all 96 vcpus:

c5.24xlarge
r5.24xlarge
m5.24xlarge
c5.metal
r5.metal
m5.metal

b. All instances are launched in a single AZ

Steps

  1. Create a cluster, I used the following config:
[global]
cluster_template = multi-instance
update_check = true
sanity_check = true

[aws]
aws_region_name = us-east-1

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[cluster multi-instance]
base_os = alinux2
key_name = amzn2
vpc_settings = us-east-1
scheduler = slurm
master_instance_type = c5.2xlarge
compute_instance_type = c5.24xlarge
initial_queue_size = 0
max_queue_size = 200
maintain_initial_size = true
disable_hyperthreading = true
fsx_settings = fsx
cluster_type = spot

[fsx fsx]
shared_dir = /fsx
storage_capacity = 1200

[vpc us-east-1]
vpc_id = vpc-b5d7e3cc
master_subnet_id = subnet-5eda8e04
  1. Go to ASG Console > parallelcluster-[cluster_name] > Details > Note the name of the Launch Template, i.e. ComputeServerLaunchTemplate_ncE8QrsA7HW9

Screen Shot 2020-06-16 at 3 43 17 PM

  1. Go to Launch Templates Console > find the template you noted the name of above and click Modify template (Create new version)

image

  1. On the Edit screen, Click on the Instance type drop down and select: Don't include in launch tempate:

image

Under Advanced details uncheck Request Spot instances:

image

Save the new version.

  1. Go back to ASG Console > parallelcluster-[cluster_name] > Edit > Select Launch Template Version > Select 2

image

Select fleet Composition Combine purchase options and instances. Select the Instance Types (all with the same number of VCPU's) that you want to use.

image

Click Save.

Scale Up

Go back to ASG Console > parallelcluster-[cluster_name] > Edit > Set Desired to 200 instances:

Grab a cup of ☕️ as it scales

image(11)

image(10)

🚀

@sean-smith
Copy link
Author

@zcobell, I should probably update this, yes later versions of parallelcluster have mutli-instance type support built in by default.

See https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorial-mqm.html for instructions

@zcobell
Copy link

zcobell commented Aug 9, 2021

Thanks. I don't currently see a way to allow the cluster to limit the total number of nodes balanced into the system, only on an individual server type

@sean-smith
Copy link
Author

@zcobell You're right you can't limit the total number of instances for all instance types in the cluster, however you can limit each individual instance type like so:

[compute_resource cr1]
instance_type = c5.xlarge
min_count = 0
initial_count = 2
max_count = 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment