Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save hsinhoyeh/c4614ea020616a8edbb675aa0b44827b to your computer and use it in GitHub Desktop.
Save hsinhoyeh/c4614ea020616a8edbb675aa0b44827b to your computer and use it in GitHub Desktop.
// ref: https://minikube.sigs.k8s.io/docs/tutorials/nvidia_gpu/
// os: ubuntu 18.04 LTE
// hardware: GCE instance with GPU(nvidia-tesla0k80)
1. install nvidia driver
```
https://gist.github.com/hsinhoyeh/495752aaf252bebdd2f3b51011dc060f
```
2. install docker 19.03 or later
```
apt-get update
apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
apt-get update
apt-get install docker-ce docker-ce-cli containerd.io
usermod -aG docker $USER
```
3. install nvidia docker plugin
```
// ref: https://github.com/NVIDIA/nvidia-docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```
4. config nvidia as default driver
sudo apt-get install nvidia-container-runtime
vim /etc/docker/daemon.json
```
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
```
5. restart docker daemon
```
sudo systemctl restart docker
```
6. run test on gpu enabled container
```
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
```
7. start minikube
```
https://gist.github.com/hsinhoyeh/c5f60b4cbe41a1e6478ae5ea10f47497
```
8. install nvidia k8s plugins
```
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml
```
9.
```
kubectl get nodes -ojson | jq .items[].status.capacity
```
9-1. test by running gpu pod
```
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment