-
-
Save primus852/b6bac167509e6f352efb8a462dcf1854 to your computer and use it in GitHub Desktop.
#!/bin/bash | |
### steps #### | |
# verify the system has a cuda-capable gpu | |
# download and install the nvidia cuda toolkit and cudnn | |
# setup environmental variables | |
# verify the installation | |
### | |
### to verify your gpu is cuda enable check | |
lspci | grep -i nvidia | |
### If you have previous installation remove it first. | |
sudo apt-get purge nvidia* | |
sudo apt remove nvidia-* | |
sudo rm /etc/apt/sources.list.d/cuda* | |
sudo apt-get autoremove && sudo apt-get autoclean | |
sudo rm -rf /usr/local/cuda* | |
# system update | |
sudo apt-get update | |
sudo apt-get upgrade | |
# install other import packages | |
sudo apt-get install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev | |
# first get the PPA repository driver | |
sudo add-apt-repository ppa:graphics-drivers/ppa | |
sudo apt update | |
# install nvidia driver with dependencies | |
sudo apt install libnvidia-common-515 | |
sudo apt install libnvidia-gl-515 | |
sudo apt install nvidia-driver-515 | |
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin | |
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 | |
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub | |
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" | |
sudo apt-get update | |
# installing CUDA-11.7 | |
sudo apt install cuda-11-7 | |
# setup your paths | |
echo 'export PATH=/usr/local/cuda-11.7/bin:$PATH' >> ~/.bashrc | |
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc | |
source ~/.bashrc | |
sudo ldconfig | |
# install cuDNN v11.7 | |
# First register here: https://developer.nvidia.com/developer-program/signup | |
CUDNN_TAR_FILE="cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz" | |
wget https://developer.nvidia.com/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz | |
tar -xzvf ${CUDNN_TAR_FILE} | |
# copy the following files into the cuda toolkit directory. | |
sudo cp -P cuda/include/cudnn.h /usr/local/cuda-11.7/include | |
sudo cp -P cuda/lib/libcudnn* /usr/local/cuda-11.7/lib64/ | |
sudo chmod a+r /usr/local/cuda-11.7/lib64/libcudnn* | |
# Finally, to verify the installation, check | |
nvidia-smi | |
nvcc -V | |
# install Pytorch (an open source machine learning framework) | |
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117 |
The tar command didn't work for me. I used 'file' to show format: "cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz: XZ compressed data, checksum CRC64". The command "tar -xJvf ${CUDNN_TAR_FILE}" worked instead as it used XZ decompression.
Also why aren't the movement commands be :" sudo cp -P cuda/include/cudnn*.h /usr/local/cuda-11.7/include" to catch all the files? I got some files left behind and when compiling with cudnn, I triggered errors that I need to debug with the last command.
(base) mona@ard-gpu-01:~$ wget https://developer.nvidia.com/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
--2023-05-26 18:02:27-- https://developer.nvidia.com/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
Resolving developer.nvidia.com (developer.nvidia.com)... 152.195.19.142
Connecting to developer.nvidia.com (developer.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://developer.nvidia.com/downloads/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz [following]
--2023-05-26 18:02:27-- https://developer.nvidia.com/downloads/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
Reusing existing connection to developer.nvidia.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://developer.nvidia.com/login [following]
--2023-05-26 18:02:27-- https://developer.nvidia.com/login
Reusing existing connection to developer.nvidia.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 34978 (34K) [text/html]
Saving to: ‘cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz’
cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz 100%[===============================================================================================================>] 34.16K --.-KB/s in 0.001s
2023-05-26 18:02:27 (33.7 MB/s) - ‘cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz’ saved [34978/34978]
(base) mona@ard-gpu-01:~$ tar -xzvf ${CUDNN_TAR_FILE}
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
(base) mona@ard-gpu-01:~$ tar -xvf ${CUDNN_TAR_FILE}
xz: (stdin): File format not recognized
tar: Child returned status 1
tar: Error is not recoverable: exiting now
I followed every step of this instruction. But nvidia-smi the following information:
NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2
what could be possibly wrong?
tar -xzvf ${CUDNN_TAR_FILE}
tar -xzvf cudnn-linux-x86_64-8.9.3.28_cuda11-archive.tar.xz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
didnt work for me so accordingly with the other comments i went to nvidia's website to download.
I reach this part but.... this is confusing. It is installed but the file can't be found for some weird reason.
$ sudo apt install ./cudnn-local-repo-ubuntu2204-8.9.3.28_1.0-1_amd64.deb
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'cudnn-local-repo-ubuntu2204-8.9.3.28' instead of './cudnn-local-repo-ubuntu2204-8.9.3.28_1.0-1_amd64.deb'
cudnn-local-repo-ubuntu2204-8.9.3.28 is already the newest version (1.0-1).
0 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.
sudo cp -P cuda/include/cudnn.h /usr/local/cuda-11.7/include
cp: cannot stat 'cuda/include/cudnn.h': No such file or directory
any idea how to continue?
@fatbringer to link back to my previous comment, this stackoverflow link helped in solving the cudnn tar file problem
https://stackoverflow.com/questions/39643013/gzip-stdin-not-in-gzip-format-tar-child-returned-status-1-tar-error-is-not-r
i used tar xf name/of/ofile instead
Then, because the extracted file was not in a folder called cuda, the folder name was literally the archive name lol. So you'd have to copy from the correct place.
For my case, it was "/home/username/cudnn-linux-x86_64-8.9.3.28_cuda11-archive/include/cudnn.h' "
Change the directory accordingly, or just rename your extracted folder to "cuda" and follow the original instructions
before https://gist.github.com/primus852/b6bac167509e6f352efb8a462dcf1854#file-cuda_11-7_installation_on_ubuntu_22-04-L32
sudo apt full-upgrade
After many tries on Windows and Linux, I recommend using conda
and then installing cuda in conda like this:
conda install -c conda-forge cudatoolkit=11.8 cudnn=8.9.2
Reinstalling cuda after some update messed up the previous installation. These instructions worked for me when installing driver-535
but now i get this bunch of errors
sudo apt install nvidia-driver-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
nvidia-driver-535 : Depends: libnvidia-compute-535 (= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.3 is to be installed
Depends: libnvidia-decode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
Depends: libnvidia-encode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
Recommends: nvidia-settings but it is not going to be installed
Recommends: nvidia-prime (>= 0.8) but it is not going to be installed
Recommends: libnvidia-compute-535:i386 (= 535.104.12-0ubuntu1)
Recommends: libnvidia-decode-535:i386 (= 535.104.12-0ubuntu1)
Recommends: libnvidia-encode-535:i386 (= 535.104.12-0ubuntu1)
Recommends: libnvidia-fbc1-535:i386 (= 535.104.12-0ubuntu1)
Recommends: libnvidia-gl-535:i386 (= 535.104.12-0ubuntu1)
E: Unable to correct problems, you have held broken packages.
Purging also didnt fix it. What options do i have?
Reinstalling cuda after some update messed up the previous installation. These instructions worked for me when installing driver-535
but now i get this bunch of errors
sudo apt install nvidia-driver-535 Reading package lists... Done Building dependency tree... Done Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: nvidia-driver-535 : Depends: libnvidia-compute-535 (= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.3 is to be installed Depends: libnvidia-decode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed Depends: libnvidia-encode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed Recommends: nvidia-settings but it is not going to be installed Recommends: nvidia-prime (>= 0.8) but it is not going to be installed Recommends: libnvidia-compute-535:i386 (= 535.104.12-0ubuntu1) Recommends: libnvidia-decode-535:i386 (= 535.104.12-0ubuntu1) Recommends: libnvidia-encode-535:i386 (= 535.104.12-0ubuntu1) Recommends: libnvidia-fbc1-535:i386 (= 535.104.12-0ubuntu1) Recommends: libnvidia-gl-535:i386 (= 535.104.12-0ubuntu1) E: Unable to correct problems, you have held broken packages.
Purging also didnt fix it. What options do i have?
You may want to look at this solution: https://github.com/ichxw/cuda_11.7_installation_on_Ubuntu_22.04
@ichxw hey thanks for sharing ur answer. I still get stuck at the step sudo apt install nvidia-driver-515
, or for my case it is sudo apt install nvidia-driver-535
since im going for that version.
sudo apt-get install nvidia-driver-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
nvidia-driver-535 : Depends: libnvidia-compute-535 (= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.3 is to be installed
Depends: libnvidia-decode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
Depends: libnvidia-encode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed
Recommends: nvidia-settings but it is not going to be installed
Recommends: nvidia-prime (>= 0.8) but it is not going to be installed
Recommends: libnvidia-compute-535:i386 (= 535.104.12-0ubuntu1)
Recommends: libnvidia-decode-535:i386 (= 535.104.12-0ubuntu1)
Recommends: libnvidia-encode-535:i386 (= 535.104.12-0ubuntu1)
Recommends: libnvidia-fbc1-535:i386 (= 535.104.12-0ubuntu1)
Recommends: libnvidia-gl-535:i386 (= 535.104.12-0ubuntu1)
E: Unable to correct problems, you have held broken packages.:
~$ sudo apt install -f libnvidia-decode-535 libnvidia-encode-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
libnvidia-decode-535 : Depends: libnvidia-compute-535 (= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.3 is to be installed
E: Unable to correct problems, you have held broken packages.
~$ sudo apt install -f libnvidia-compute-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libnvidia-compute-535 is already the newest version (535.113.01-0ubuntu0.22.04.3).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Tried installing the dependencies manually too but it didnt work
If i trace the error message, would it be helpful if i tried to install libnvidia-compute-535
version 535.104.12-0ubuntu1 ?
but i am not sure how do i specify in my terminal that i want specifically that version only.
Reinstalling cuda after some update messed up the previous installation. These instructions worked for me when installing driver-535
but now i get this bunch of errors
sudo apt install nvidia-driver-535 Reading package lists... Done Building dependency tree... Done Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: nvidia-driver-535 : Depends: libnvidia-compute-535 (= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.3 is to be installed Depends: libnvidia-decode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed Depends: libnvidia-encode-535 (= 535.104.12-0ubuntu1) but it is not going to be installed Recommends: nvidia-settings but it is not going to be installed Recommends: nvidia-prime (>= 0.8) but it is not going to be installed Recommends: libnvidia-compute-535:i386 (= 535.104.12-0ubuntu1) Recommends: libnvidia-decode-535:i386 (= 535.104.12-0ubuntu1) Recommends: libnvidia-encode-535:i386 (= 535.104.12-0ubuntu1) Recommends: libnvidia-fbc1-535:i386 (= 535.104.12-0ubuntu1) Recommends: libnvidia-gl-535:i386 (= 535.104.12-0ubuntu1) E: Unable to correct problems, you have held broken packages.
Purging also didnt fix it. What options do i have?
do this sudo apt install libnvidia-compute-535
Thanks a lot for making the shell script. I am also running into a dependency issue, however. Any fix for this? sudo apt install libnvidia-compute-515 is already part of the script, so that doesn't help. Tried 535 as well
The following packages have unmet dependencies:
cuda-11-7 : Depends: cuda-runtime-11-7 (>= 11.7.1) but it is not going to be installed
Depends: cuda-demo-suite-11-7 (>= 11.7.91) but it is not going to be installed
@primus852 @fatbringer @ichxw @qburst-fidha Did you guys get this figured out?
Although the above suggestions did not work for me, I found that the runfile installation from the nvidia website, rather than the apt-get installation does work. Maybe this helps someone else!
sudo apt-get purge nvidia*
sudo apt remove nvidia-*
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt-get autoremove && sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*
Reboot
This setup did not work for me. Whenever I try to use the GPU the following exception is raised:
This is while
nvidia-smi
is absolutely empty. I'm running everything on a headless server.