It contains steps for installing CUDA. It further shows how one can install Tensorflow and Pytorch and use GPUs with them.Our motive is to get output as in the image below,
where nvidia-smi, nvcc --version, torch.cuda.is_available(),tf.test.gpu_devic_name() all are working. Let's go then. I will specifically install CUDA 10.1 where both Tensorflow and Pytorch works fine.
1.1.a Verify You Have a CUDA-Capable GPU
*lspci | grep -i nvidia*
It lists the nvidia GPU you have on your system. Check that on https://developer.nvidia.com/cuda-gpus .If it is listed, your GPU is CUDA-capable.
1.1.b There are many versions of Linux, CUDA is supported on few distribution. To determine which distribution and release number you're running, type the following at the command line
*uname -m && cat /etc/*release*
1.1.c. Verify that system has gcc installed: Most of the times it comes pre installed with operating system use gcc --version .If it’s not installed, use :
sudo apt-get install manpages-dev
sudo apt-get update
sudo apt install build-essential
sudo apt-get install manpages-dev
1.1.d Verifying correct Kernel headers
sudo apt-get install linux-headers-$(uname -r)
you can find documentation related to any cuda version from here https://developer.nvidia.com/cuda-toolkit-archive I am specifically doing this for 10.1
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.105-1_amd64.deb
1. sudo dpkg -i cuda-repo-ubuntu1804_10.1.105-1_amd64.deb
2. sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
3. sudo apt-get update
Next line is tricky and confusing for most people. Since it installs the most recent cuda version,and shouldn't be used if you don't want latest version of cuda.
sudo apt-get install cuda(don't use it, use below one )
4. sudo apt-get install cuda-10-1
If you do nvidia-smi or nvcc --version now, they would work becuase they are yet to be added to bashrc. Update bashrc now .
export PATH="/usr/local/cuda-10.1/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH"
After which do,
source .bashrc
If nvidia-smi or nvcc --version are not working now, try to reboot the system. It would work.
We need to install torch and tensorflow now. we can use pip or conda environment. If you don't want to install CuDNN manually, it's better to anaconda. It automatically installs CuDNN and saves lot of hassle.
Install anaconda now. I specifically followed this link https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart
Make environment now using conda like and activate as well:
conda create --name my_env python=3
conda activate my_env
Inside this environment now do,
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
conda install tensorflow-gpu
It will install pytorch and tensorflow. checking if they are working : Open pythonon terminal now and import tensorflow and torch inside it. Now, do:
torch.cuda.is_available()
If it returns True, torch is able to use GPU. Check for tensorflow now. If below command shows GPU name then Tensorflow is working with GPU as well.
tf.test.gpu_device_name()