Skip to content

Instantly share code, notes, and snippets.

@klueska
Last active May 7, 2021 12:56
Show Gist options
  • Save klueska/c6adb011d869daf8ebb7c6f83a31181a to your computer and use it in GitHub Desktop.
Save klueska/c6adb011d869daf8ebb7c6f83a31181a to your computer and use it in GitHub Desktop.
# Run the export command of nvidia-mig-parted
nvidia-mig-parted export
# Show nvidia-smi on the host with 8 full GPUs
nvidia-smi -L
# List out the NVIDIA container toolkit packages installed
dpkg -l | grep nvidia-container; dpkg -l | grep nvidia-docker
# Show that docker is configured to run with the nvidia container toolkit
cat /etc/docker/daemon.json
# Show that a minimal 1 node kubernetes cluster is up and running
kubectl get pod --all-namespaces
# Show that there are currently no GPUs being advertised by the node
kubectl get node -o json | jq .items[0].status.allocatable
# Add helm repos for the k8s-device-plugin and gpu-feature-discovery
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo add nvgfd https://nvidia.github.io/gpu-feature-discovery
helm repo update
# Helm install the k8s-device-plugin with the mixed MIG strategy
helm install \
--version=0.9.0 \
--set migStrategy=mixed \
nvidia-device-plugin \
nvdp/nvidia-device-plugin
# Helm install gpu-feature-discovery with the mixed MIG strategy
helm install \
--version=0.4.1 \
--set migStrategy=mixed \
gpu-feature-discovery \
nvgfd/gpu-feature-discovery
# Show the helm charts installed and running
helm list
# Show the daemonsets running
kubectl get pod --all-namespaces
# Show the set of GPUs being advertised on the node
kubectl get node -o json | jq '.items[0].status.allocatable | with_entries(select(.key | contains("nvidia")))'
# Show the set of nvidia labels on the node
kubectl get node -o json | jq '.items[0].metadata.labels | with_entries(select(.key | contains("nvidia")))'
# Run a pod to consume a GPU
kubectl run -it --rm \
--image=nvidia/cuda:11.0-base \
--restart=Never \
--limits=nvidia.com/gpu=1 \
gpu-pod -- nvidia-smi -L
# Run the export command of nvidia-mig-parted
nvidia-mig-parted export
# Show the config file for nvidia-mig-parted
cat /etc/nvidia-mig-manager/config.yaml
# Run nvidia-mig-parted to reconfigure MIG on the node
time nvidia-mig-parted apply -c all-1g.5gb
# Run the export command of nvidia-mig-parted
nvidia-mig-parted export
# Show the set of GPUs being advertised on the node and the node labels
kubectl get node -o json | jq '.items[0].status.allocatable | with_entries(select(.key | contains("nvidia")))'
kubectl get node -o json | jq '.items[0].metadata.labels | with_entries(select(.key | contains("nvidia")))'
# Run a pod to consume a MIG device
kubectl run -it --rm \
--image=nvidia/cuda:11.0-base \
--restart=Never \
--limits=nvidia.com/mig-1g.5gb=1 \
mig-pod -- bash -x -c "
nvidia-smi -L; echo "";
ls -la /dev/nvidia*; echo "";
find /proc/driver/nvidia/capabilities;
"
# Helm upgrade the k8s-device-plugin with the single MIG strategy
helm upgrade \
--version=0.9.0 \
--set migStrategy=single \
nvidia-device-plugin \
nvdp/nvidia-device-plugin
# Helm upgrade gpu-feature-discovery with the single MIG strategy
helm upgrade \
--version=0.4.1 \
--set migStrategy=single \
gpu-feature-discovery \
nvgfd/gpu-feature-discovery
# Show the set of GPUs being advertised on the node and the node labels
kubectl get node -o json | jq '.items[0].status.allocatable | with_entries(select(.key | contains("nvidia")))'
kubectl get node -o json | jq '.items[0].metadata.labels | with_entries(select(.key | contains("nvidia")))'
# Run a pod to consume a MIG device
kubectl run -it --rm \
--image=nvidia/cuda:11.0-base \
--restart=Never \
--limits=nvidia.com/gpu=1 \
gpu-pod -- bash -x -c "
nvidia-smi -L; echo "";
ls -la /dev/nvidia*; echo "";
find /proc/driver/nvidia/capabilities;
"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment