Skip to content

Instantly share code, notes, and snippets.

@antonioanerao
Last active September 1, 2024 14:21
Show Gist options
  • Save antonioanerao/51e048a2895a00596d8db7f39ca5b505 to your computer and use it in GitHub Desktop.
Save antonioanerao/51e048a2895a00596d8db7f39ca5b505 to your computer and use it in GitHub Desktop.
Stack para buildar imagem com CUDA e llama-cpp e subir stack com docker-compose
services:
cuda-llama-cpp:
image: imagem-buildada
ports:
- 5001:5001
command: python3 app.py
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
ARG CUDA_IMAGE="12.5.0-devel-ubuntu22.04"
FROM nvidia/cuda:${CUDA_IMAGE}
WORKDIR /app
ENV ACCEPT_EULA=Y
ENV DEBIAN_FRONTEND=noninteractive
ENV HOST=0.0.0.0
ENV CUDA_DOCKER_ARCH=all
ENV GGML_CUDA=1
RUN apt-get update && apt-get upgrade -y \
&& apt-get install -y git build-essential \
python3 python3-pip gcc wget \
libopenblas-dev
COPY ./requirements.txt .
RUN pip install -r requirements.txt
RUN CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
EXPOSE 5001
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment