Mace installation on ANL's Swing as of 1/30/24

Check resources available on Swing: NVIDIA A100 GPUs (8 GPUs per node) - 1/8 node allocated when requesting 1 gpu

MACE requires Pytorch2, which needs CUDA 11.8 or 11.7. Check required CUDA versions here

Install Torch and Cuda

Create a Conda environment with Python 3.10

conda create -n "CUDA-torch-base" python=3.10.0

Activate once the environment is created

conda activate CUDA-torch-base

Read install Pytorch instructions from their official website. For me this is the command:

conda install pytorch=2.2.0 torchvision=0.17.0 torchaudio=2.2.0 pytorch-cuda=11.8  -c pytorch -c nvidia

Before Torch, Torchvision, Torchaudio are installed: check that they are installing the cuda version and not the cpu version. For example a cpu version of Pytorch source will be pytorch/linux-64::pytorch-2.0.1-py3.10_cpu_0 while the cuda version will say pytorch/linux-64::pytorch-2.2.0-py3.10_cuda11.8_cudnn8.7.0_0. For Pytorch 1.11.0 (for Allegro), I had success with cuda 11.3.

Check whether cuda-enabled torch is installed correctly. Request a GPU (dont forget to ssh in the compute node after resource granted) In Python use

import torch
assert torch.cuda.is_available() = True

Install MACE

If all is well, install MACE following their installation guide

conda create --name NNFF-MACE --clone CUDA-base2
conda activate NNFF-MACE

pip install mace-torch

To learn about MACE, follow this tutorial at https://github.com/ilyes319/mace-tutorials/blob/main/mace-users/MACE_users.ipynb. As a test run download solvent_test.xyz and solvent_train.xyz from the repo, then run this command on a compute node:

mace_run_train \
    --name="model" \
    --train_file="$DATA/solvent_train.xyz" \
    --valid_fraction=0.05 \
    --test_file="$DATA/solvent_test.xyz" \
    --E0s="isolated" \
    --energy_key="energy" \
    --forces_key="forces" \
    --model="MACE" \
    --num_interactions=2 \
    --max_ell=2 \
    --hidden_irreps="16x0e" \
    --num_cutoff_basis=5 \
    --correlation=2 \
    --r_max=3.0 \
    --batch_size=5 \
    --valid_batch_size=5 \
    --eval_interval=1 \
    --max_num_epochs=50 \
    --start_swa=15 \
    --swa_energy_weight=1000 \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --error_table="PerAtomRMSE" \
    --default_dtype="float32" \
    --swa \
    --device=cuda \
    --seed=1234

MACE+LAMMPS

Instructions for LAMMPS with MACE here

KOKKOS in ALCF

Instructions to compile Polaris with Kokkos here

alvarovm/MACE_install.md

Mace installation on ANL's Swing as of 1/30/24

Install Torch and Cuda

Install MACE

MACE+LAMMPS

KOKKOS in ALCF