Like a lot of people, i prototype things in notebooks. When it comes to working with GPUs, a common problem i've had to deal with over and over is keeping my environment in delicate balance.
A Stable arrangement I ended up using:
- Ubuntu 18.04
- Docker 19.03
- nvidia-docker https://github.com/NVIDIA/nvidia-docker
- Running notebooks in containers instead of locally, keeping my files and data mounted in volumes.
Add and Update GPU driver repository:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
Check which driver should install (depends on card), run to see recommendations, install 410 with xorg package:
ubuntu-drivers devices
sudo apt-get install xserver-xorg-video-nvidia-410 nvidia-driver-410
sudo reboot
(Optional, may not need this) If secureboot is enabled, MOK management bluescreen may appear, Enroll MOK -> continue -> enter secure boot password -> reboot
Verify card is being recognised:
nvidia-smi
If machine has multiple gfx cards, use nvidia.
prime-select query
prime-select nvidia
Prerequesites, keys, repos:
sudo apt update
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt update
For docker hub login, may need additional packages (Cannot create an item in a locked collection error):
sudo apt-get install gnupg2 pass
Make sure docker-ce is coming from the right place:
apt-cache policy docker-ce
Install Docker:
sudo apt-get install docker-ce
Check that it’s running:
sudo systemctl status docker
Add Users to Docker group:
sudo usermod -aG docker igor
Now install Nvidia-Docker
# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-container-toolkit
sudo systemctl restart docker
Verify GPU is working:
docker run --gpus all --rm nvidia/cuda nvidia-smi
Enable docker-compose to use GPUs:
Edit /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"experimental": true
}
After everything is configured, to work on something i launch notebooks in a docker instance, mounting a volume with my local notebooks folder /home/notebooks
- change this to whatever local folder you keep your notebooks and data in:
docker run --runtime=nvidia --gpus all -u $(id -u):$(id -g) -it --rm -v $(realpath /home/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter
If you run that on a server, and want to remotely conect to it:
ssh -N -f -L localhost:8888:localhost:8888 [email protected]
I use this tensorflow image as a general purpose one, even though i may not use tensorflow, it's just easy to use it.
for installing dependencies, i use the notebook itself, with
!pip install ...
For heavier or more complex things, i make my own docker image, based on the nvidia docker images with everything already there. For example:
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
ARG DEBIAN_FRONTEND=noninteractive
RUN ...
This will start you off with mostly everything, you can install miniconda here and use that image for things later. Other tags are available in https://hub.docker.com/r/nvidia/cuda/tags for different versions of CUDA.
More useful Docker commands:
When running docker build, add DOCKER_BUILDKIT=1
for faster and better builds:
DOCKER_BUILDKIT=1 docker build -t mycontainer:latest .
Cleaning Up after builds (docker cached layers take up a lot of HDD space after a while):
docker system prune
Removing tagged images:
docker rmi name:tag
Renaming Tags:
docker tag name:oldtag name:newtag