Skip to content

Instantly share code, notes, and snippets.

@fpaupier
Last active February 17, 2025 13:07
Show Gist options
  • Save fpaupier/395f25bb19d3d80bdd0d2ce6c5806d45 to your computer and use it in GitHub Desktop.
Save fpaupier/395f25bb19d3d80bdd0d2ce6c5806d45 to your computer and use it in GitHub Desktop.
vLLM setup
# Start vLLM OpenAI compatible server
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--api-key your-secret-key \
--tokenizer-mode "mistral" \
--model mistralai/Mistral-Small-24B-Instruct-2501
# Test query
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secret-key" \
-d '{
"model": "mistralai/Mistral-Small-24B-Instruct-2501",
"prompt": "Raconte moi une histoire sur l exploration spatiale",
"max_tokens": 128,
"temperature": 0.7
}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment