Skip to content

Instantly share code, notes, and snippets.

@kratsg
Last active December 14, 2021 20:25
Show Gist options
  • Save kratsg/28fbb655dadba92bcdc3aa491cd5e89b to your computer and use it in GitHub Desktop.
Save kratsg/28fbb655dadba92bcdc3aa491cd5e89b to your computer and use it in GitHub Desktop.
Instructions for migrating from cuda10.x/cudnn8.0.x to cuda11.2/cudnn8.2
# CUDA
export CUDA_VERSION="11.2"
export CUDA_HOME="/usr/local/cuda-${CUDA_VERSION}"
export LD_LIBRARY_PATH="/usr/local/cuda-${CUDA_VERSION}/include:/usr/local/cuda-${CUDA_VERSION}/lib64:$LD_LIBRARY_PATH"
export PATH="/usr/local/cuda-${CUDA_VERSION}/bin:$PATH"
export XLA_FLAGS=--xla_gpu_cuda_data_dir=${CUDA_HOME}

Instructions for updating slugpu GPU drivers

We had a previous installation of cuda 10.2 / cudnn 8.0.1 -- so following the instructions found on TensorFlow's website but modified for the fact that we have Ubuntu 20.04

$ uname -a
Linux slugpu 5.8.0-41-generic #46~20.04.1-Ubuntu SMP Mon Jan 18 17:52:23 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Apart from the instructions - I also had to remove our older cuda installation and nvcc binary:

sudo rm /usr/bin/nvcc
sudo rm -rf /usr/lib/cuda

which pointed at the 10.2 variations. After all is done, I also noticed that nvidia-smi reported 11.5 while nvcc --version reported 11.2 which is ok. This is apparently how the compatibility is defined.

$ nvidia-smi
Tue Dec 14 12:22:54 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:26:00.0 Off |                  N/A |
| 41%   27C    P8     1W / 260W |     15MiB / 11016MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      6613      G   /usr/lib/xorg/Xorg                  9MiB |
|    0   N/A  N/A      6945      G   /usr/bin/gnome-shell                4MiB |
+-----------------------------------------------------------------------------+

and

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

Testing with jax

Following the instructions for the jax readme, I install the GPU option

$ pip install jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_releases.html

and then checked if gpu was used

$ python
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from jax.lib import xla_bridge
>>> print(xla_bridge.get_backend().platform)
gpu

so all good to go.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb
rm nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb
sudo apt-get update
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/libnccl2_2.8.3-1+cuda11.2_amd64.deb
sudo apt install ./libnccl2_2.8.3-1+cuda11.2_amd64.deb
rm libnccl2_2.8.3-1+cuda11.2_amd64.deb
sudo apt-get update
sudo apt-get purge nvidia-driver-440
sudo apt install nvidia-driver-450
# versions come from https://developer.nvidia.com/rdp/cudnn-archive
sudo apt-get install --no-install-recommends cuda-11-2 libcudnn8=8.2.0.53-1+cuda11.3 libcudnn8-dev=8.2.0.53-1+cuda11.3
# clean up
sudo apt-get update
sudo apt autoremove
# reboot
sudo shutdown -r now
# check drivers
nvidia-smi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment