Sean Smith sean-smith

General

	sge	slurm	torque
Submit Interactive Job	qlogin	srun	qsub -I
Submit Batch Job	qsub	sbatch	qsub
Number of Slots	-pe mpi [n]	-n [n]	-l ppn=[n]
Number of Nodes	-pe mpi [slots * n]	-N [n]	-l nodes=[n]
Cancel Job	qdel	scancel	qdel
See Queue	qstat	squeue	qstat

AWS ParallelCluster Multi-Instance Type

This guide will help you setup a SPOT-based multi-instance type cluster. In order to accomplish this, we make a few assumptions:

a. Instances share the same number of vcpus, in this case it's all 96 vcpus:

DCV Native Client w/ AWS ParallelCluster

Setup a cluster with DCV
Install the Native client: NICE DCV | Download
Create a script pcluster-dcv-connect.py with the contents as shown below:
Execute that script

# make sure you have pcluster installed
$ pcluster list --color

Create a Cluster with Spack Binary Cache

This binary cache is a subset of the Exascale Computing Project's Extreme-Scale Scientific Software Stack (E4S) (https://oaciss.uoregon.edu/ecp/).

package	install command	working?
openfoam	`spack install --no-check-signature --cache-only openfoam`	✅
gromacs	`spack install --no-check-signature --cache-only gromacs`	✅
gromacs without SLURM/PMI support	`spack install --no-check-signature --cache-only gromacs ^openmpi~pmi schedulers=none`	✅
ior	`spack install --no-check-signature --cache-only ior`	✅
osu-micro-benchmarks	`spack install --no-check-signature --cache-only osu-micro-benchmarks`	✅

DCV Visualization Queue

When DCV is enabled, the default behaviour of AWS ParallelCluster is to run a single DCV session on the head node, this is a quick and easy way to visualize the results of your simulations or run a desktop application such as StarCCM+.

A common ask is to run DCV sessions on a compute queue instead of the head node. This has several advantages, namely:

Run multiple sessions on the same instance (possibly with different users per-session)
Run a smaller head node and only spin up more-expensive DCV instances when needed. We set a 12 hr timer below that automatically kills sessions after we leave.

Setup

Multi-User AWS ParallelCluster

In this example we're going to setup an HPC environment with AWS ParallelCluster and connect it to Microsoft AD, an AWS service that allows you to create managed Active Directory user pools. You can read more about it in the AD Tutorial.

You have three different options for AD provider, we're going to go with Microsoft AD due to the regional availibility. This allows us to use it in the same region (Ohio) as our hpc6a.48xlarge instances.

Type	Description
Simple AD	Open AD protocol, supported in only a [few](https://docs.aws.amazon.com/directoryservice/

Mount FSx Netapp ONTAP with AWS ParallelCluster

FSx Netapp is a multi-protocol filesystem. It mounts on Windows as SMB, Linux as NFS and Mac. This allows cluster users to bridge their Windows and Linux machines with the same filesystem, potentially running both windows and linux machines for a post-processing workflow.

Pros

Multi-Protocol
Hybrid support
Multi-AZ (for High Availibility)

Mount FSx Lustre on AWS Batch

This guide describes how to mount FSx Lustre filesystem. I give an example cloudformation stack to create the AWS Batch resources.

I loosely follow this guide.

For the parameters, it's important that the Subnet, Security Group, FSx ID and Fsx Mount Name follow the guidelines below:

Parameter	Description

Save StarCCM+ State in AWS ParallelCluster

Spot termination gives a 2-minute warning before terminating the instance. This time period allows you to gracefully save data in order to resume later.

In the following I describe how this can be done with StarCCM+ in AWS ParallelCluster 3.X:

Setup

Create a post-install script spot.sh like so:

	#!/usr/bin/env python3
	import json
	from base64 import b64decode, b64encode
	from pprint import pprint
	import boto3
	import botocore
	import yaml
	import requests

	def sigv4_request(method, host, path, params, headers, body):