Skip to content

Instantly share code, notes, and snippets.

@agam
Created July 30, 2024 23:28
Show Gist options
  • Save agam/0b25c6aef67ab6f127bb44dc9f48c0e1 to your computer and use it in GitHub Desktop.
Save agam/0b25c6aef67ab6f127bb44dc9f48c0e1 to your computer and use it in GitHub Desktop.
MPIJob using DWS
apiVersion: kubeflow.org/v2beta1
kind: MPIJob
metadata:
namespace: some-name
name: mpi-hello-world-14
labels:
kueue.x-k8s.io/queue-name: dws-local-queue
annotations:
provreq.kueue.x-k8s.io/provisioning-class-name: "queued-provisioning.gke.io"
provreq.kueue.x-k8s.io/maxRunDurationSeconds: "600"
spec:
slotsPerWorker: 1
runPolicy:
suspend: true
cleanPodPolicy: Running
mpiReplicaSpecs:
Launcher:
replicas: 1
template:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: manual-t4
cloud.google.com/gke-accelerator: nvidia-tesla-t4
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- key: "cloud.google.com/gke-queued"
operator: "Equal"
value: "true"
effect: "NoSchedule"
containers:
- image: agam/mpi-hello-world:latest
name: mpi-launcher
command:
- mpirun
- --allow-run-as-root
- -np
- "3"
- ./mpi_hello_world
resources:
requests:
cpu: "100m"
memory: "100Mi"
nvidia.com/gpu: "1"
limits:
cpu: "100m"
memory: "100Mi"
nvidia.com/gpu: "1"
Worker:
replicas: 2
template:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: some-name
cloud.google.com/gke-accelerator: nvidia-tesla-t4
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- key: "cloud.google.com/gke-queued"
operator: "Equal"
value: "true"
effect: "NoSchedule"
containers:
- image: agam/mpi-hello-world:latest
name: mpi-worker
env:
- name: OMPI_ALLOW_RUN_AS_ROOT
value: "1"
- name: OMPI_ALLOW_RUN_AS_ROOT_CONFIRM
value: "1"
resources:
requests:
cpu: "100m"
memory: "100Mi"
nvidia.com/gpu: "1"
limits:
cpu: "100m"
memory: "100Mi"
nvidia.com/gpu: "1"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment