Skip to content

Instantly share code, notes, and snippets.

@fvisconti
Created April 26, 2024 09:01
Show Gist options
  • Save fvisconti/9e47079ff2c4b8b06f91de9f7d7e834e to your computer and use it in GitHub Desktop.
Save fvisconti/9e47079ff2c4b8b06f91de9f7d7e834e to your computer and use it in GitHub Desktop.
Dockerfile for TensorRT-LLM
# Since procedures to install TensorRT-LLM such as
# here: https://nvidia.github.io/TensorRT-LLM/installation/linux.html
# or here: https://developer.nvidia.com/blog/optimizing-inference-on-llms-with-tensorrt-llm-now-publicly-available/
# suffer more than an issue,
# I share my Dockerfile to have it built as a docker image locally
# First stage for building dependencies with a slim Python base
FROM python:3.10-slim-bullseye AS build-deps
RUN apt-get update && apt-get install -y --no-install-recommends \
python3-dev openmpi-bin libopenmpi-dev git gcc \
&& rm -rf /var/lib/apt/lists/*
# Install TensorRT-LLM
RUN pip install --no-cache-dir tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
RUN git clone https://github.com/NVIDIA/TensorRT-LLM.git && \
cd TensorRT-LLM && \
pip install --no-cache-dir -r requirements.txt
# Second stage for installing git-lfs
FROM debian:bullseye-slim AS git-lfs-deps
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash \
&& apt-get install -y git-lfs \
&& git-lfs install
# Final stage to build the CUDA-based runtime image
FROM nvidia/cuda:12.4.0-base-ubuntu22.04
# Copy the necessary Python dependencies and binaries from the build-deps stage
COPY --from=build-deps /usr/local/lib/python3.10/site-packages /usr/local/lib/python3.10/site-packages
COPY --from=build-deps /usr/bin /usr/bin
COPY --from=build-deps /usr/local/bin /usr/local/bin
# Copy git-lfs from the git-lfs-deps stage
COPY --from=git-lfs-deps /usr/bin/git-lfs /usr/bin/git-lfs
# Clean up unnecessary files
RUN rm -rf /var/lib/apt/lists/*
# This image is around 10 GB on my machine, thus halving the image
# naively built with instructions at the above links from official documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment