Skip to content

Instantly share code, notes, and snippets.

@michaelgold
Created March 18, 2026 01:48
Show Gist options
  • Select an option

  • Save michaelgold/9ae80fbfee7ad573d0c0b82c9af49a1c to your computer and use it in GitHub Desktop.

Select an option

Save michaelgold/9ae80fbfee7ad573d0c0b82c9af49a1c to your computer and use it in GitHub Desktop.
docker-compose NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
services:
vllm:
image: vllm/vllm-openai:v0.17.1-cu130
restart: unless-stopped
container_name: vllm-nemotron
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- NVIDIA_VISIBLE_DEVICES=all
- VLLM_WORKER_MULTIPROC_METHOD=spawn
volumes:
- /home/mg/.cache/huggingface:/root/.cache/huggingface
ports:
- "8000:8000"
ipc: host
command: >
--model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
--port 8000
--trust-remote-code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment