Skip to content

Instantly share code, notes, and snippets.

@theobjectivedad
Last active February 1, 2024 23:25
Show Gist options
  • Save theobjectivedad/8b74f58b3eb7bbbf46019f16660d6c5f to your computer and use it in GitHub Desktop.
Save theobjectivedad/8b74f58b3eb7bbbf46019f16660d6c5f to your computer and use it in GitHub Desktop.

Serving miqu in aphrodite-engine

Quick instrustions on how to run miqu in the aphrodite-engine.

STEP 1: Convert GGUF to PyTorch format per documentation. I used the aphrodite-engine container to avoid setting up dependencies.

docker run --gpus=all -it --rm -v /models:/models -v `pwd`:/workspace alignmentlabai/aphrodite-engine:latest bash

STEP 2: Run the following commands in the container to convert the GGUF model to PyTorch:

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python 
wget https://github.com/PygmalionAI/aphrodite-engine/raw/main/examples/gguf_to_torch.py
python gguf_to_torch.py --input /models/miqu-1-70b/miqu-1-70b.q5_K_M.gguf --output /models/miqu-1-70b-pt

STEP 3: Start the container, tested on 2x A6000s

docker run -it -d \
  --name=aphrodite-miqu \
  --restart=always \
  --shm-size=15g \
  --ulimit memlock=-1 \
  --gpus='"device=0,1"' \
  --publish=8000:8000 \
  --volume=/models:/models:ro \
  alignmentlabai/aphrodite-engine:latest \
    python -m aphrodite.endpoints.openai.api_server \
    --served-model-name miqu \
    --model /models/miqu-1-70b-pt \
    --quantization gguf \
    --load-format auto \
    --tokenizer-mode auto \
    --dtype auto \
    --tensor-parallel-size 2 \
    --worker-use-ray \
    --gpu-memory-utilization 0.8 \
    --response-role gpt \
    --port 8000 \
    --host 0.0.0.0
@theobjectivedad
Copy link
Author

Ty to AlpinDale for your suggestion to simplify step 2!

@AlpinDale
Copy link

I'd probably omit --worker-use-ray, set --dtype to half, and remove --response-role. Otherwise, this LGTM. Probably --gmu-memory-utilization 0.95 if the GPU can take it, but the default 0.9 is fine too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment