Last active
February 17, 2025 13:07
-
-
Save fpaupier/395f25bb19d3d80bdd0d2ce6c5806d45 to your computer and use it in GitHub Desktop.
vLLM setup
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Start vLLM OpenAI compatible server | |
| docker run --runtime nvidia --gpus all \ | |
| -v ~/.cache/huggingface:/root/.cache/huggingface \ | |
| --env "HUGGING_FACE_HUB_TOKEN=<secret>" \ | |
| -p 8000:8000 \ | |
| --ipc=host \ | |
| vllm/vllm-openai:latest \ | |
| --api-key your-secret-key \ | |
| --tokenizer-mode "mistral" \ | |
| --model mistralai/Mistral-Small-24B-Instruct-2501 | |
| # Test query | |
| curl http://localhost:8000/v1/completions \ | |
| -H "Content-Type: application/json" \ | |
| -H "Authorization: Bearer your-secret-key" \ | |
| -d '{ | |
| "model": "mistralai/Mistral-Small-24B-Instruct-2501", | |
| "prompt": "Raconte moi une histoire sur l exploration spatiale", | |
| "max_tokens": 128, | |
| "temperature": 0.7 | |
| }' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment