Serving miqu in aphrodite-engine

Quick instrustions on how to run miqu in the aphrodite-engine.

STEP 1: Convert GGUF to PyTorch format per documentation. I used the aphrodite-engine container to avoid setting up dependencies.

docker run --gpus=all -it --rm -v /models:/models -v `pwd`:/workspace alignmentlabai/aphrodite-engine:latest bash

STEP 2: Run the following commands in the container to convert the GGUF model to PyTorch:

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python 
wget https://github.com/PygmalionAI/aphrodite-engine/raw/main/examples/gguf_to_torch.py
python gguf_to_torch.py --input /models/miqu-1-70b/miqu-1-70b.q5_K_M.gguf --output /models/miqu-1-70b-pt

STEP 3: Start the container, tested on 2x A6000s

docker run -it -d \
  --name=aphrodite-miqu \
  --restart=always \
  --shm-size=15g \
  --ulimit memlock=-1 \
  --gpus='"device=0,1"' \
  --publish=8000:8000 \
  --volume=/models:/models:ro \
  alignmentlabai/aphrodite-engine:latest \
    python -m aphrodite.endpoints.openai.api_server \
    --served-model-name miqu \
    --model /models/miqu-1-70b-pt \
    --quantization gguf \
    --load-format auto \
    --tokenizer-mode auto \
    --dtype auto \
    --tensor-parallel-size 2 \
    --worker-use-ray \
    --gpu-memory-utilization 0.8 \
    --response-role gpt \
    --port 8000 \
    --host 0.0.0.0

theobjectivedad/aphrodite-miqu.md

Serving miqu in aphrodite-engine

theobjectivedad commented Feb 1, 2024

Uh oh!

AlpinDale commented Feb 1, 2024

Uh oh!