This binary cache is a subset of the Exascale Computing Project's Extreme-Scale Scientific Software Stack (E4S) (https://oaciss.uoregon.edu/ecp/).
package | install command | working? |
---|---|---|
openfoam | spack install --no-check-signature --cache-only openfoam |
✅ |
gromacs | spack install --no-check-signature --cache-only gromacs |
✅ |
gromacs without SLURM/PMI support | spack install --no-check-signature --cache-only gromacs ^openmpi~pmi schedulers=none |
✅ |
ior | spack install --no-check-signature --cache-only ior |
✅ |
osu-micro-benchmarks | spack install --no-check-signature --cache-only osu-micro-benchmarks |
✅ |
- Create a Cluster, I used the following config:
Important You must have s3_read_resource = arn:aws:s3:::*
[aws]
aws_region_name = ${AWS_DEFAULT_REGION}
[global]
cluster_template = default
update_check = false
sanity_check = true
[cluster default]
key_name = ${AWS_DEFAULT_REGION}
vpc_settings = public
base_os = alinux2
ebs_settings = myebs
compute_instance_type = c5.18xlarge
master_instance_type = c5n.2xlarge
cluster_type = ondemand
placement_group = DYNAMIC
placement = compute
max_queue_size = 8
initial_queue_size = 0
disable_hyperthreading = true
scheduler = slurm
s3_read_resource = arn:aws:s3:::*
[vpc public]
vpc_id = ${vpc_id}
master_subnet_id = ${master_subnet_id}
compute_subnet_id = ${compute_subnet_id}
[ebs myebs]
shared_dir = /shared
volume_type = gp2
volume_size = 20
[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}
- Install Spack
sudo su
export SPACK_ROOT=/shared/spack
mkdir -p $SPACK_ROOT
git clone https://github.com/spack/spack $SPACK_ROOT
cd $SPACK_ROOT
git checkout releases/v0.15
echo "export SPACK_ROOT=$SPACK_ROOT" > /etc/profile.d/spack.sh
echo "source $SPACK_ROOT/share/spack/setup-env.sh" >> /etc/profile.d/spack.sh
exit
source /etc/profile.d/spack.sh
sudo chown -R $USER:$USER $SPACK_ROOT
Verify the install:
spack -V
0.15.4
- Add the environment
mv $SPACK_ROOT/etc/spack/packages.yaml $HOME/bak_packages.yaml
mkdir -p $SPACK_ROOT/var/spack/environments/aws
wget https://gist.githubusercontent.com/bollig/71383f92143ed6b006e5c3892343fef8/raw/2_spack.yaml -O $SPACK_ROOT/var/spack/environments/aws/spack.yaml
- Activate the environment
$ spack env list
aws
$ spack env activate aws
$ spack concretize
- Install Python 3 and Boto3
sudo yum install -y python3
sudo pip3 install boto3
- Install packages!
NOTE: when you install packages within the aws
spack environment, they are installed globally. You can activate them as modules later without loading the spack environment. The environment ensures that your spack configuration exactly matches the CI/CD pipeline when installing packages.
spack env activate aws
spack install --no-check-signature ior
spack env deactivate
- Test your packages:
module load osu-micro-benchmarks
srun -N 2 --ntasks-per-node=1 ior -w -r -o=/scratch/test_dir -b=256m -a=POSIX -i=5 -F -z -t=64m -C
-
Confirm EFA support (optional):
a. update your ParallelCluster config to use a EFA-enabled compute_instance_type (e.g.,
c5n.18xlarge
) and addenable_efa = compute
.b. update your cluster with:
pcluster stop -c config.ini cluster_name pcluster update -c config.ini cluster_name
c. ssh back to the cluster, and run:
salloc -N 2 --tasks-per-node=1 srun -N 2 --ntasks-per-node=1 --pty bash
d. inside the interactive prompt:
module load osu-micro-benchmarks fi_info -l # EFA enabled: srun -N 2 --ntasks-per-node=1 osu_bw # EFA disabled: FI_PROVIDER=^efa srun -N 2 --ntasks-per-node=1 osu_bw
- My packages are installing from source - help!
Note: patchelf always installs from source - this is because it's a spack dependency and not a package dependency.
There's 2 reasons why this may happen:
a. No access to the S3 mirror. Run:
$ aws s3 ls s3://spack-mirrors/amzn2-e4s
An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
If you see "Access Denied" then add the AmazonS3ReadOnlyAccess
to your instance's iam profile, or add s3_read_resource = arn:aws:s3:::*
to your cluster's config and run update.
b. Environment isn't activated
Check spack env list
and make sure aws
is in green, if not run spack env activate aws
- OpenMPI, Libfabric, SLURM or other packages always use the local modules or paths:
This is likely because you have a packages.yaml configured globally. Use
spack config blame packages
to identify which config files are impacting the configuration. For example, if you see
[...]
/shared/spack-0.15/var/spack/environments/aws/spack.yaml:139 slurm:
/shared/spack-0.15/var/spack/environments/aws/spack.yaml:154 paths:
/shared/spack-0.15/etc/spack/packages.yaml:12 [email protected] +pmix: /opt/slurm/
/shared/spack-0.15/var/spack/environments/aws/spack.yaml:152 buildable: True
[...]
you can either extend your $SPACK_ROOT/var/spack/environments/aws/spack.yaml
to disable the paths/modules (e.g, slurm: { paths: {[email protected] +pmix: null} }
), or simply remove the offending packages.yaml (preferred).
NoTears HPC Users: rm $SPACK_ROOT/etc/spack/packages.yaml
- If you see this error after installing a package:
==> Error: Failed to install XXXXXXXXXX due to ModuleNotFoundError: No module named 'botocore'
==> Error: No module named 'botocore'
, it is caused by a the spack installed python package overriding the system-wide python installed via yum
. Do the following:
spack install --no-cache py-pip
pip3 install boto3
Then rerun your desired package install commands.
- If you see this error:
srun: error: _parse_next_key: Parsing error at unrecognized key: NodeSet
srun: error: Parse error in file /opt/slurm/etc/pcluster/slurm_parallelcluster_compute_partition.conf line 5: "NodeSet=compute_nodes Nodes=compute-dy-c5n18xlarge-[1-10]"
srun: error: "Include" failed in file /opt/slurm/etc/slurm_parallelcluster.conf line 8
srun: error: "Include" failed in file /opt/slurm/etc/slurm.conf line 70
srun: fatal: Unable to process configuration file
then module unload slurm
before running any SLURM commands (srun, squeue, etc.). The module version of SLURM does not match the ParallelCluster provided version, and some syntax of the config file is unsupported.