Skip to content

Instantly share code, notes, and snippets.

@Jeffwan
Created June 17, 2025 12:56
Show Gist options
  • Save Jeffwan/bdbdd85437afcc5e3372382d85cb8f25 to your computer and use it in GitHub Desktop.
Save Jeffwan/bdbdd85437afcc5e3372382d85cb8f25 to your computer and use it in GitHub Desktop.
deepseek-r1-aws.yaml
apiVersion: orchestration.aibrix.ai/v1alpha1
kind: RayClusterFleet
metadata:
labels:
app.kubernetes.io/name: aibrix
model.aibrix.ai/name: deepseek-r1-671b
model.aibrix.ai/port: "8000"
name: deepseek-r1-671b
spec:
replicas: 1
selector:
matchLabels:
model.aibrix.ai/name: deepseek-r1-671b
model.aibrix.ai/port: "8000"
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-671b
model.aibrix.ai/port: "8000"
annotations:
ray.io/overwrite-container-cmd: "true"
spec:
rayVersion: '2.40.0'
headGroupSpec:
rayStartParams:
dashboard-host: '0.0.0.0'
block: 'false'
template:
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-671b
model.aibrix.ai/port: "8000"
spec:
containers:
- name: ray-head
image: aibrix/vllm-openai:v0.7.3.self.post1
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: service
command: ["/bin/bash", "-lc", "--"]
args: ["ulimit -n 65536; echo head; $KUBERAY_GEN_RAY_START_CMD; vllm serve /models/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --tensor-parallel-size 8 --pipeline-parallel-size 2 --distributed-executor-backend ray --uvicorn-log-level warning"]
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
resources:
limits:
nvidia.com/gpu: 8
requests:
nvidia.com/gpu: 8
securityContext:
capabilities:
add:
- IPC_LOCK
startupProbe:
httpGet:
path: /metrics
port: service
initialDelaySeconds: 180
failureThreshold: 150
periodSeconds: 10
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/shm
name: shared-mem
- mountPath: /models
name: models
volumes:
- name: shared-mem
emptyDir:
medium: Memory
- name: models
hostPath:
path: /mnt/nvme0/models
type: Directory
workerGroupSpecs:
- replicas: 1
minReplicas: 1
maxReplicas: 1
groupName: worker-group
rayStartParams: {}
template:
metadata:
labels:
model.aibrix.ai/name: deepseek-r1-671b
model.aibrix.ai/port: "8000"
spec:
containers:
- name: ray-worker
image: aibrix/vllm-openai:v0.7.3.self.post1
command: ["/bin/bash", "-lc", "--"]
args: ["ulimit -n 65536; echo head; $KUBERAY_GEN_RAY_START_CMD;"]
env:
- name: GLOO_SOCKET_IFNAME
value: eth0
- name: NCCL_SOCKET_IFNAME
value: eth0
lifecycle:
preStop:
exec:
command: [ "/bin/sh","-c","ray stop" ]
resources:
limits:
nvidia.com/gpu: 8
requests:
nvidia.com/gpu: 8
securityContext:
capabilities:
add:
- IPC_LOCK
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/shm
name: shared-mem
- mountPath: /models
name: models
volumes:
- name: shared-mem
emptyDir:
medium: Memory
- name: models
hostPath:
path: /mnt/nvme0/models
type: Directory
@Jeffwan
Copy link
Author

Jeffwan commented Jun 17, 2025

make sure you download model manifest under /mnt/nvme0/models/deepseek-r1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment