Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save primus852/b6bac167509e6f352efb8a462dcf1854 to your computer and use it in GitHub Desktop.
Save primus852/b6bac167509e6f352efb8a462dcf1854 to your computer and use it in GitHub Desktop.
Instructions for CUDA v11.7 and cuDNN 8.5 installation on Ubuntu 22.04 for PyTorch 1.12.1
#!/bin/bash
### steps ####
# verify the system has a cuda-capable gpu
# download and install the nvidia cuda toolkit and cudnn
# setup environmental variables
# verify the installation
###
### to verify your gpu is cuda enable check
lspci | grep -i nvidia
### If you have previous installation remove it first.
sudo apt-get purge nvidia*
sudo apt remove nvidia-*
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt-get autoremove && sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*
# system update
sudo apt-get update
sudo apt-get upgrade
# install other import packages
sudo apt-get install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev
# first get the PPA repository driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
# install nvidia driver with dependencies
sudo apt install libnvidia-common-515
sudo apt install libnvidia-gl-515
sudo apt install nvidia-driver-515
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
sudo apt-get update
# installing CUDA-11.7
sudo apt install cuda-11-7
# setup your paths
echo 'export PATH=/usr/local/cuda-11.7/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
sudo ldconfig
# install cuDNN v11.7
# First register here: https://developer.nvidia.com/developer-program/signup
CUDNN_TAR_FILE="cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz"
wget https://developer.nvidia.com/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
tar -xzvf ${CUDNN_TAR_FILE}
# copy the following files into the cuda toolkit directory.
sudo cp -P cuda/include/cudnn.h /usr/local/cuda-11.7/include
sudo cp -P cuda/lib/libcudnn* /usr/local/cuda-11.7/lib64/
sudo chmod a+r /usr/local/cuda-11.7/lib64/libcudnn*
# Finally, to verify the installation, check
nvidia-smi
nvcc -V
# install Pytorch (an open source machine learning framework)
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117
@aryan-f
Copy link

aryan-f commented May 19, 2023

This setup did not work for me. Whenever I try to use the GPU the following exception is raised:

RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable

This is while nvidia-smi is absolutely empty. I'm running everything on a headless server.

@jaredquekjz
Copy link

jaredquekjz commented May 21, 2023

The tar command didn't work for me. I used 'file' to show format: "cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz: XZ compressed data, checksum CRC64". The command "tar -xJvf ${CUDNN_TAR_FILE}" worked instead as it used XZ decompression.

Also why aren't the movement commands be :" sudo cp -P cuda/include/cudnn*.h /usr/local/cuda-11.7/include" to catch all the files? I got some files left behind and when compiling with cudnn, I triggered errors that I need to debug with the last command.

@monajalal
Copy link

(base) mona@ard-gpu-01:~$ wget https://developer.nvidia.com/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
--2023-05-26 18:02:27-- https://developer.nvidia.com/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
Resolving developer.nvidia.com (developer.nvidia.com)... 152.195.19.142
Connecting to developer.nvidia.com (developer.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://developer.nvidia.com/downloads/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz [following]
--2023-05-26 18:02:27-- https://developer.nvidia.com/downloads/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
Reusing existing connection to developer.nvidia.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://developer.nvidia.com/login [following]
--2023-05-26 18:02:27-- https://developer.nvidia.com/login
Reusing existing connection to developer.nvidia.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 34978 (34K) [text/html]
Saving to: ‘cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz’

cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz 100%[===============================================================================================================>] 34.16K --.-KB/s in 0.001s

2023-05-26 18:02:27 (33.7 MB/s) - ‘cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz’ saved [34978/34978]

(base) mona@ard-gpu-01:~$ tar -xzvf ${CUDNN_TAR_FILE}

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
(base) mona@ard-gpu-01:~$ tar -xvf ${CUDNN_TAR_FILE}
xz: (stdin): File format not recognized
tar: Child returned status 1
tar: Error is not recoverable: exiting now

@monajalal
Copy link

download after logging in

Screenshot from 2023-05-26 18-05-16

@ichxw
Copy link

ichxw commented Jul 8, 2023

I followed every step of this instruction. But nvidia-smi the following information:
NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2

what could be possibly wrong?

@fatbringer
Copy link

fatbringer commented Jul 14, 2023

tar -xzvf ${CUDNN_TAR_FILE}
 tar -xzvf cudnn-linux-x86_64-8.9.3.28_cuda11-archive.tar.xz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

didnt work for me so accordingly with the other comments i went to nvidia's website to download.

I reach this part but.... this is confusing. It is installed but the file can't be found for some weird reason.

$ sudo apt install ./cudnn-local-repo-ubuntu2204-8.9.3.28_1.0-1_amd64.deb 
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'cudnn-local-repo-ubuntu2204-8.9.3.28' instead of './cudnn-local-repo-ubuntu2204-8.9.3.28_1.0-1_amd64.deb'
cudnn-local-repo-ubuntu2204-8.9.3.28 is already the newest version (1.0-1).
0 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.


sudo cp -P cuda/include/cudnn.h /usr/local/cuda-11.7/include
cp: cannot stat 'cuda/include/cudnn.h': No such file or directory

any idea how to continue?

@fatbringer
Copy link

fatbringer commented Jul 17, 2023

@fatbringer to link back to my previous comment, this stackoverflow link helped in solving the cudnn tar file problem
https://stackoverflow.com/questions/39643013/gzip-stdin-not-in-gzip-format-tar-child-returned-status-1-tar-error-is-not-r

i used tar xf name/of/ofile instead
Then, because the extracted file was not in a folder called cuda, the folder name was literally the archive name lol. So you'd have to copy from the correct place.
For my case, it was "/home/username/cudnn-linux-x86_64-8.9.3.28_cuda11-archive/include/cudnn.h' "
Change the directory accordingly, or just rename your extracted folder to "cuda" and follow the original instructions

@navono
Copy link

navono commented Aug 29, 2023

After many tries on Windows and Linux, I recommend using conda and then installing cuda in conda like this:

conda install -c conda-forge cudatoolkit=11.8 cudnn=8.9.2

@fatbringer
Copy link

Reinstalling cuda after some update messed up the previous installation. These instructions worked for me when installing driver-535

but now i get this bunch of errors

sudo apt install nvidia-driver-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-driver-535 : Depends: libnvidia-compute-535 (= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.3 is to be installed
                     Depends: libnvidia-decode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
                     Depends: libnvidia-encode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
                     Recommends: nvidia-settings but it is not going to be installed
                     Recommends: nvidia-prime (>= 0.8) but it is not going to be installed
                     Recommends: libnvidia-compute-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-decode-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-encode-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-fbc1-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-gl-535:i386 (= 535.104.12-0ubuntu1)
E: Unable to correct problems, you have held broken packages.

Purging also didnt fix it. What options do i have?

@ichxw
Copy link

ichxw commented Oct 10, 2023

Reinstalling cuda after some update messed up the previous installation. These instructions worked for me when installing driver-535

but now i get this bunch of errors

sudo apt install nvidia-driver-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-driver-535 : Depends: libnvidia-compute-535 (= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.3 is to be installed
                     Depends: libnvidia-decode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
                     Depends: libnvidia-encode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
                     Recommends: nvidia-settings but it is not going to be installed
                     Recommends: nvidia-prime (>= 0.8) but it is not going to be installed
                     Recommends: libnvidia-compute-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-decode-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-encode-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-fbc1-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-gl-535:i386 (= 535.104.12-0ubuntu1)
E: Unable to correct problems, you have held broken packages.

Purging also didnt fix it. What options do i have?

You may want to look at this solution: https://github.com/ichxw/cuda_11.7_installation_on_Ubuntu_22.04

@fatbringer
Copy link

fatbringer commented Oct 11, 2023

@ichxw hey thanks for sharing ur answer. I still get stuck at the step sudo apt install nvidia-driver-515, or for my case it is sudo apt install nvidia-driver-535 since im going for that version.

 sudo apt-get install nvidia-driver-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-driver-535 : Depends: libnvidia-compute-535 (= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.3 is to be installed
                     Depends: libnvidia-decode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
                     Depends: libnvidia-encode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
                     Recommends: nvidia-settings but it is not going to be installed
                     Recommends: nvidia-prime (>= 0.8) but it is not going to be installed
                     Recommends: libnvidia-compute-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-decode-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-encode-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-fbc1-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-gl-535:i386 (= 535.104.12-0ubuntu1)
E: Unable to correct problems, you have held broken packages.:
~$ sudo apt install -f libnvidia-decode-535 libnvidia-encode-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libnvidia-decode-535 : Depends: libnvidia-compute-535 (= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.3 is to be installed
E: Unable to correct problems, you have held broken packages.
~$ sudo apt install -f libnvidia-compute-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libnvidia-compute-535 is already the newest version (535.113.01-0ubuntu0.22.04.3).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Tried installing the dependencies manually too but it didnt work
If i trace the error message, would it be helpful if i tried to install libnvidia-compute-535 version 535.104.12-0ubuntu1 ?
but i am not sure how do i specify in my terminal that i want specifically that version only.

@qburst-fidha
Copy link

Reinstalling cuda after some update messed up the previous installation. These instructions worked for me when installing driver-535

but now i get this bunch of errors

sudo apt install nvidia-driver-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-driver-535 : Depends: libnvidia-compute-535 (= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.3 is to be installed
                     Depends: libnvidia-decode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
                     Depends: libnvidia-encode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
                     Recommends: nvidia-settings but it is not going to be installed
                     Recommends: nvidia-prime (>= 0.8) but it is not going to be installed
                     Recommends: libnvidia-compute-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-decode-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-encode-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-fbc1-535:i386 (= 535.104.12-0ubuntu1)
                     Recommends: libnvidia-gl-535:i386 (= 535.104.12-0ubuntu1)
E: Unable to correct problems, you have held broken packages.

Purging also didnt fix it. What options do i have?

do this sudo apt install libnvidia-compute-535

@bdzyubak-aiq
Copy link

Thanks a lot for making the shell script. I am also running into a dependency issue, however. Any fix for this? sudo apt install libnvidia-compute-515 is already part of the script, so that doesn't help. Tried 535 as well
The following packages have unmet dependencies:
cuda-11-7 : Depends: cuda-runtime-11-7 (>= 11.7.1) but it is not going to be installed
Depends: cuda-demo-suite-11-7 (>= 11.7.91) but it is not going to be installed

@bdzyubak-aiq
Copy link

@primus852 @fatbringer @ichxw @qburst-fidha Did you guys get this figured out?

@bdzyubak-aiq
Copy link

bdzyubak-aiq commented Jan 8, 2024

Although the above suggestions did not work for me, I found that the runfile installation from the nvidia website, rather than the apt-get installation does work. Maybe this helps someone else!

sudo apt-get purge nvidia*
sudo apt remove nvidia-*
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt-get autoremove && sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*

Reboot

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment