Skip to content

Instantly share code, notes, and snippets.

@sean-smith
sean-smith / schedulers.md
Last active July 30, 2020 00:19
Scheduler Cheatsheet for AWS ParallelCluster

General

sge slurm torque
Submit Interactive Job qlogin srun qsub -I
Submit Batch Job qsub sbatch qsub
Number of Slots -pe mpi [n] -n [n] -l ppn=[n]
Number of Nodes -pe mpi [slots * n] -N [n] -l nodes=[n]
Cancel Job qdel scancel qdel
See Queue qstat squeue qstat
@sean-smith
sean-smith / multi-instance-type-pcluster.md
Last active August 24, 2021 23:10
AWS ParallelCluster Multi Instance Type

AWS ParallelCluster Multi-Instance Type

AWS ParallelCluster Architecture Diagram

This guide will help you setup a SPOT-based multi-instance type cluster. In order to accomplish this, we make a few assumptions:

a. Instances share the same number of vcpus, in this case it's all 96 vcpus:

@sean-smith
sean-smith / pcluster_dcv_native_client.md
Last active August 15, 2022 06:32
Connect to DCV setup with AWS ParallelCluster using the Native Client

DCV Native Client w/ AWS ParallelCluster

  1. Setup a cluster with DCV
  2. Install the Native client: NICE DCV | Download
  3. Create a script pcluster-dcv-connect.py with the contents as shown below:
  4. Execute that script
# make sure you have pcluster installed
$ pcluster list --color
@bollig
bollig / 1_Spack_Binary_cache.md
Last active February 9, 2021 16:09 — forked from sean-smith/Spack_Binary_cache.md
Create a Cluster with Spack Binary Cache

Create a Cluster with Spack Binary Cache

This binary cache is a subset of the Exascale Computing Project's Extreme-Scale Scientific Software Stack (E4S) (https://oaciss.uoregon.edu/ecp/).

package install command working?
openfoam spack install --no-check-signature --cache-only openfoam
gromacs spack install --no-check-signature --cache-only gromacs
gromacs without SLURM/PMI support spack install --no-check-signature --cache-only gromacs ^openmpi~pmi schedulers=none
ior spack install --no-check-signature --cache-only ior
osu-micro-benchmarks spack install --no-check-signature --cache-only osu-micro-benchmarks
@sean-smith
sean-smith / example.py
Created September 29, 2021 15:13
Call AWS ParallelCluster API with Python
#!/usr/bin/env python3
import json
from base64 import b64decode, b64encode
from pprint import pprint
import boto3
import botocore
import yaml
import requests
def sigv4_request(method, host, path, params, headers, body):
@sean-smith
sean-smith / dcv.md
Last active April 5, 2022 19:33
Create a desktop visualization queue with AWS ParallelCluster and NICE DCV

DCV Visualization Queue

When DCV is enabled, the default behaviour of AWS ParallelCluster is to run a single DCV session on the head node, this is a quick and easy way to visualize the results of your simulations or run a desktop application such as StarCCM+.

A common ask is to run DCV sessions on a compute queue instead of the head node. This has several advantages, namely:

  1. Run multiple sessions on the same instance (possibly with different users per-session)
  2. Run a smaller head node and only spin up more-expensive DCV instances when needed. We set a 12 hr timer below that automatically kills sessions after we leave.

Setup

@sean-smith
sean-smith / 01-pcluster-multiuser.md
Last active May 9, 2022 21:25
How to setup a multi-user AWS ParallelCluster Environment

Multi-User AWS ParallelCluster

In this example we're going to setup an HPC environment with AWS ParallelCluster and connect it to Microsoft AD, an AWS service that allows you to create managed Active Directory user pools. You can read more about it in the AD Tutorial.

You have three different options for AD provider, we're going to go with Microsoft AD due to the regional availibility. This allows us to use it in the same region (Ohio) as our hpc6a.48xlarge instances.

Type Description
Simple AD Open AD protocol, supported in only a [few](https://docs.aws.amazon.com/directoryservice/

Mount FSx Netapp ONTAP with AWS ParallelCluster

FSx Netapp is a multi-protocol filesystem. It mounts on Windows as SMB, Linux as NFS and Mac. This allows cluster users to bridge their Windows and Linux machines with the same filesystem, potentially running both windows and linux machines for a post-processing workflow.

Screen Shot 2022-03-07 at 5 29 23 PM

Pros

  • Multi-Protocol
  • Hybrid support
  • Multi-AZ (for High Availibility)

Mount FSx Lustre on AWS Batch

This guide describes how to mount FSx Lustre filesystem. I give an example cloudformation stack to create the AWS Batch resources.

I loosely follow this guide.

For the parameters, it's important that the Subnet, Security Group, FSx ID and Fsx Mount Name follow the guidelines below:

Parameter Description
@sean-smith
sean-smith / spot-starccm+-termination.md
Last active June 17, 2022 18:37
StarCCM+ Spot Instance Termination

Save StarCCM+ State in AWS ParallelCluster

Spot termination gives a 2-minute warning before terminating the instance. This time period allows you to gracefully save data in order to resume later.

In the following I describe how this can be done with StarCCM+ in AWS ParallelCluster 3.X:

Setup

  1. Create a post-install script spot.sh like so: