Skip to content

Instantly share code, notes, and snippets.

@bryantbiggs
Last active March 14, 2024 15:26
Show Gist options
  • Select an option

  • Save bryantbiggs/c3268b64d4195973ad2ad6c26e355ba2 to your computer and use it in GitHub Desktop.

Select an option

Save bryantbiggs/c3268b64d4195973ad2ad6c26e355ba2 to your computer and use it in GitHub Desktop.
CUDA 12.3 w/ EFA
# https://nvidia.github.io/container-wiki/toolkit/container-images.html
# https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags
# https://gitlab.com/nvidia/container-images/cuda
FROM nvcr.io/nvidia/cuda:12.3.2-cudnn9-runtime-centos7 as base
ARG EFA_INSTALLER_VERSION='1.30.0'
ARG AWS_OFI_NCCL_VERSION='1.8.1'
ENV PATH="$PATH:/opt/amazon/efa/bin:/opt/amazon/openmpi/bin"
ENV LD_LIBRARY_PATH="/usr/lib64/:/usr/local/cuda/lib64/:/opt/amazon/efa/lib64/:/opt/amazon/openmpi/lib64/"
# AWS EFA installer
RUN curl -sL https://efa-installer.amazonaws.com/aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz | tar xvz \
&& pushd aws-efa-installer \
&& ./efa_installer.sh --skip-kmod --skip-limit-conf --no-verify --yes \
&& popd \
&& rm -rf aws-efa-installer* \
&& yum clean all \
&& rm -rf /var/cache/dnf/* /var/cache/yum/*
# AWS OFI NCCL
RUN yum install -y \
automake \
autoconf \
libtool \
&& curl -sL https://github.com/aws/aws-ofi-nccl/releases/download/v${AWS_OFI_NCCL_VERSION}-aws/aws-ofi-nccl-${AWS_OFI_NCCL_VERSION}-aws.tar.gz | tar xvz \
&& pushd aws-ofi-nccl-${AWS_OFI_NCCL_VERSION}-aws \
&& ./autogen.sh \
&& ./configure --prefix=/opt/aws-ofi-nccl/install \
--with-libfabric=/opt/amazon/efa/ \
--with-mpi=/opt/amazon/openmpi/ \
--with-cuda=/usr/local/cuda \
--with-nccl=/usr/lib64/ \
--enable-platform-aws \
&& make \
&& make install \
&& popd \
&& rm -rf aws-ofi-nccl-* \
&& yum clean all \
&& rm -rf /var/cache/dnf/* /var/cache/yum/*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment