State as of 2017-07-31.
You can also check a guide to upgrade CUDA on a [PC with with GTX 980 Ti and Ubuntu 16.04](https://gist.github.com/bzamecnik/61b293a3891e166797491f38d579d060.
- NVIDIA driver 375.66
- latest is 384.59 (2017.7.28) - I haven't not tried yet
- CUDA Toolkit 8.0
- cuDNN 5.1
- latest is 6.0, but not supported by TensorFlow 1.2.1
We'll see how to install individual components and also that that we can install all with just one reboot. In total it takes around 3 GB of disk space.
- https://docs.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup
- https://askubuntu.com/questions/886445/how-do-i-properly-install-cuda-8-on-an-azure-vm-running-ubuntu-14-04-lts
Tested on Azure NC6 with 1x Tesla K80.
$ lspci | grep -i NVIDIA
a450:00:00.0 3D controller [0302]: NVIDIA Corporation GK210GL [Tesla K80] [10de:102d] (rev a1)
NOTE: Removing the nouveau driver is not necessary, installation of cuda-drivers
do that automatically:
Setting up nvidia-375 (375.66-0ubuntu1) ...
update-alternatives: using /usr/lib/nvidia-375/ld.so.conf to provide /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf (x86_64-linux-gnu_gl_conf) in auto mode
update-alternatives: using /usr/lib/nvidia-375/ld.so.conf to provide /etc/ld.so.conf.d/x86_64-linux-gnu_EGL.conf (x86_64-linux-gnu_egl_conf) in auto mode
update-alternatives: using /usr/lib/nvidia-375/alt_ld.so.conf to provide /etc/ld.so.conf.d/i386-linux-gnu_GL.conf (i386-linux-gnu_gl_conf) in auto mode
update-alternatives: using /usr/lib/nvidia-375/alt_ld.so.conf to provide /etc/ld.so.conf.d/i386-linux-gnu_EGL.conf (i386-linux-gnu_egl_conf) in auto mode
update-alternatives: using /usr/share/nvidia-375/glamor.conf to provide /usr/share/X11/xorg.conf.d/glamoregl.conf (glamor_conf) in auto mode
update-initramfs: deferring update (trigger activated)
A modprobe blacklist file has been created at /etc/modprobe.d to prevent Nouveau from loading. This can be reverted by deleting /etc/modprobe.d/nvidia-graphics-drivers.conf.
A new initrd image has also been created. To revert, please replace /boot/initrd-4.4.0-87-generic with /boot/initrd-$(uname -r)-backup.
*****************************************************************************
*** Reboot your computer and verify that the NVIDIA graphics driver can ***
*** be loaded. ***
*****************************************************************************
We will install the NVIDIA Tesla Driver via deb package.
wget http://us.download.nvidia.com/tesla/375.66/nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb
sudo dpkg -i nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-drivers
sudo reboot
https://developer.nvidia.com/cuda-downloads
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
TensorFlow 1.2.1 needs cuDNN 5.1 (not 6.0).
Needs to be downloaded via registered NVIDIA account. https://developer.nvidia.com/rdp/cudnn-download
This can be downloaded from a browser and then copied to the target machine via SCP:
sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64-deb
Add to ~/.profile:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH
. ~/.profile
Note that cuda-drivers
install a lot of unnecessary X11 stuff (in total 3.5 GB!).
We can dependency on lightdm
to save some space if we don't use GUI.
sudo dpkg -i nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64-deb
sudo apt-get update
# this needs install 3.5 GB of dependencies
sudo apt-get install cuda-drivers cuda
# possible to remove lightdm and save 0.5 GB
sudo apt-get install cuda-drivers cuda lightdm-
sudo reboot
We should see the GPU infomation:
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | A450:00:00.0 Off | 0 |
| N/A 40C P0 70W / 149W | 0MiB / 11439MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Let's run a simple "hello world" MNIST MLP in Keras/Tensorflow:
pip install tensorflow-gpu==1.2.1 keras==2.0.6
wget https://raw.githubusercontent.com/fchollet/keras/master/examples/mnist_mlp.py
python mnist_mlp.py
We should see that it uses the GPU and trains properly:
Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: a450:00:00.0)
That's it. Happy training!