Quick instrustions on how to run miqu in the aphrodite-engine.
STEP 1: Convert GGUF to PyTorch format per documentation. I used the aphrodite-engine container to avoid setting up dependencies.
docker run --gpus=all -it --rm -v /models:/models -v `pwd`:/workspace alignmentlabai/aphrodite-engine:latest bash
STEP 2: Run the following commands in the container to convert the GGUF model to PyTorch:
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
wget https://github.com/PygmalionAI/aphrodite-engine/raw/main/examples/gguf_to_torch.py
python gguf_to_torch.py --input /models/miqu-1-70b/miqu-1-70b.q5_K_M.gguf --output /models/miqu-1-70b-pt
STEP 3: Start the container, tested on 2x A6000s
docker run -it -d \
--name=aphrodite-miqu \
--restart=always \
--shm-size=15g \
--ulimit memlock=-1 \
--gpus='"device=0,1"' \
--publish=8000:8000 \
--volume=/models:/models:ro \
alignmentlabai/aphrodite-engine:latest \
python -m aphrodite.endpoints.openai.api_server \
--served-model-name miqu \
--model /models/miqu-1-70b-pt \
--quantization gguf \
--load-format auto \
--tokenizer-mode auto \
--dtype auto \
--tensor-parallel-size 2 \
--worker-use-ray \
--gpu-memory-utilization 0.8 \
--response-role gpt \
--port 8000 \
--host 0.0.0.0
Ty to AlpinDale for your suggestion to simplify step 2!