-
-
Save MihailCosmin/affa6b1b71b43787e9228c25fe15aeba to your computer and use it in GitHub Desktop.
#!/bin/bash | |
### steps #### | |
# verify the system has a cuda-capable gpu | |
# download and install the nvidia cuda toolkit and cudnn | |
# setup environmental variables | |
# verify the installation | |
### | |
### to verify your gpu is cuda enable check | |
lspci | grep -i nvidia | |
### If you have previous installation remove it first. | |
sudo apt purge nvidia* -y | |
sudo apt remove nvidia-* -y | |
sudo rm /etc/apt/sources.list.d/cuda* | |
sudo apt autoremove -y && sudo apt autoclean -y | |
sudo rm -rf /usr/local/cuda* | |
# system update | |
sudo apt update && sudo apt upgrade -y | |
# install other import packages | |
sudo apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev | |
# first get the PPA repository driver | |
sudo add-apt-repository ppa:graphics-drivers/ppa | |
sudo apt update | |
# find recommended driver versions for you | |
ubuntu-drivers devices | |
# install nvidia driver with dependencies | |
sudo apt install libnvidia-common-515 libnvidia-gl-515 nvidia-driver-515 -y | |
# reboot | |
sudo reboot now | |
# verify that the following command works | |
nvidia-smi | |
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin | |
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 | |
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub | |
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" | |
# Update and upgrade | |
sudo apt update && sudo apt upgrade -y | |
# installing CUDA-11.8 | |
sudo apt install cuda-11-8 -y | |
# setup your paths | |
echo 'export PATH=/usr/local/cuda-11.8/bin:$PATH' >> ~/.bashrc | |
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc | |
source ~/.bashrc | |
sudo ldconfig | |
# install cuDNN v11.8 | |
# First register here: https://developer.nvidia.com/developer-program/signup | |
CUDNN_TAR_FILE="cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz" | |
sudo wget https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz | |
sudo tar -xvf ${CUDNN_TAR_FILE} | |
sudo mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive cuda | |
# copy the following files into the cuda toolkit directory. | |
sudo cp -P cuda/include/cudnn.h /usr/local/cuda-11.8/include | |
sudo cp -P cuda/lib/libcudnn* /usr/local/cuda-11.8/lib64/ | |
sudo chmod a+r /usr/local/cuda-11.8/lib64/libcudnn* | |
# Finally, to verify the installation, check | |
nvidia-smi | |
nvcc -V | |
# install Pytorch (an open source machine learning framework) | |
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 |
@filmo I got it working with the following step:
# Need to sudo apt-get upgrade or the next step wont work
sudo apt upgrade
# installing CUDA-11.8
sudo apt install cuda-11-8
I think this is the proper way to install it.
-> sudo cp -P cuda/lib/libcudnn* /usr/local/cuda-11.8/lib64/
cp: target '/usr/local/cuda-11.8/lib64/' is not a directory
fix: mkdir /usr/local/cuda-11.8/lib64
and if priyamshah@priyamshah-System-Product-Name:~$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.86
do a sudo reboot
this fixes nvidia-smi
but nvcc -V is broken
on trying sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc
it says
The following packages have unmet dependencies:
libcuinj64-11.5 : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or
libnvidia-compute-495-server (>= 495) but it is not installable or
libcuda.so.1 (>= 495) or
libcuda-11.5-1
libnvidia-ml-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or
libnvidia-compute-495-server (>= 495) but it is not installable or
libnvidia-ml.so.1 (>= 495)
nvidia-cuda-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or
libnvidia-compute-495-server (>= 495) but it is not installable or
libcuda.so.1 (>= 495) or
libcuda-11.5-1
Recommends: libnvcuvid1 but it is not installable
follow this link https://stackoverflow.com/questions/66380789/nvidia-driver-installation-unmet-dependencies
[Unchecking the cuda repo from Software & Updates did the trick.]
then try again sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc
this should fix nvcc -V
Thank you for this thread, I would have never figured this out.
Fantastic! Thanks so much! Had to do it a couple times. Ended up just replacing 515 with 535 for the NVIDIA drivers and it worked!
Avoided some of the other elaborate schemes - including NVIDIAs own, very confusing and lengthy guide.
Perhaps this regex would work better, getting libnvidia, kernel modules, etc
sudo apt-get purge `.*nvidia.*`
sudo apt remove `.*nvidia.*`
I just used the 535 version NVIDIA drivers mentioned by @toebee82. When using nvidia-smi
after all the installation, it showed "Failed to initialize NVML: Driver/library version mismatch";
Then I reboot the machine, and all of them worked, but with version 520 (not 535). I guess it means to align with the 11.8 CUDA_runtime_toolkits.
Btw, about the different cuda-version showed in nvidia-smi
and nvcc
, there's an answer: https://stackoverflow.com/questions/53422407/different-cuda-versions-shown-by-nvcc-and-nvidia-smi
One should never do this
sudo rm -rf /usr/local/cuda*
Apt gets confused about what it expects to be there and what is actually there. If something needs to be removed, use apt purge, similar to pip uninstall.
Does not work with 545 drivers. I just used the 515 drivers in the command (which show up as 525 in smi?) but it seems to be working now. thanks for the thread. ive been through every tut and this is the only one thats been successful
@wbreslin951, curious, do your nvidia-smi and nvcc --version show the same cuda version being used? If so, which version is it?
SOLVED: https://forums.developer.nvidia.com/t/ubuntu-cuda-11-8-package-wrong-dependency-on-cuda-drivers/238891
When running sudo apt install cuda -y
you can specify the current nvidia driver version, preventing the installer from upgrading:
sudo apt install cuda-11-8 cuda-drivers=535.129.03-1
I need to run the 535 drivers, but after sudo apt install cuda-11-8 -y
it automatically switches over to 545 which then causes:
$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 545.23
But, when I go into "Software and Updates" and try to switch back it complains about unmet dependencies, and also all files in /usr/local/cuda-11.8/
except for ./targets/
is automatically deleted at this stage !?
-> sudo cp -P cuda/lib/libcudnn* /usr/local/cuda-11.8/lib64/ cp: target '/usr/local/cuda-11.8/lib64/' is not a directory
fix: mkdir /usr/local/cuda-11.8/lib64 and if priyamshah@priyamshah-System-Product-Name:~$ nvidia-smi Failed to initialize NVML: Driver/library version mismatch NVML library version: 535.86
do a sudo reboot
this fixes nvidia-smi
but nvcc -V is broken
on trying sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc
it says
The following packages have unmet dependencies: libcuinj64-11.5 : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or libnvidia-compute-495-server (>= 495) but it is not installable or libcuda.so.1 (>= 495) or libcuda-11.5-1 libnvidia-ml-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or libnvidia-compute-495-server (>= 495) but it is not installable or libnvidia-ml.so.1 (>= 495) nvidia-cuda-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or libnvidia-compute-495-server (>= 495) but it is not installable or libcuda.so.1 (>= 495) or libcuda-11.5-1 Recommends: libnvcuvid1 but it is not installable
follow this link https://stackoverflow.com/questions/66380789/nvidia-driver-installation-unmet-dependencies [Unchecking the cuda repo from Software & Updates did the trick.]
then try again sudo apt install nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc
this should fix nvcc -V
I am not sure this is a good solution as you have just installed the cuda-toolkit and if you do this you risk running into dependency problems. The problem could just be, as it was for me, that you didn't sucessfully add /usr/local/cuda-11.8/bin
to $PATH. First take a look in /usr/local/cuda-11.8/bin
, if nvcc is in there, just try to add it again, i.e. run
export PATH=/usr/local/cuda-11.8/bin:$PATH
and check your path with echo $PATH
to see if it's in there. If this works, simply add the export line at the bottom of your ~/.bashrc
to make it permanent.
Thanks for such great tutorial, made my own referencing yours
https://github.com/Kidney-Science/install_RTXA4000_Driver_CUDA_cudNN_Ubuntu_22
Thank you! After lots of days, it works! Instead of 515, I put 525.
I get this error even after trying the fixen given by @filmo and @mkabatek.
inp@inp-Z790-GAMING-X:~$ sudo apt install cuda-11-8
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
cuda-11-8 : Depends: cuda-runtime-11-8 (>= 11.8.0) but it is not going to be installed
Depends: cuda-demo-suite-11-8 (>= 11.8.86) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
I have even tried to install 515, 525, 535.
i have installed ubuntu 20.04
can anyone please help. @MihailCosmin @filmo @mkabatek
I ran into the same error very early on. Try my recipe at the link below on a fresh Ubuntu copy. The recipe was tested on different PC, but using the same GPU. Good luck!
https://github.com/Kidney-Science/install_RTXA4000_Driver_CUDA_cudNN_Ubuntu_22
Thank you so much, this script is marvelous !
Gosh, you saved my day. I finally solved my computing env. Thank you very much.
I ran into the same error very early on. Try my recipe at the link below on a fresh Ubuntu copy. The recipe was tested on different PC, but using the same GPU. Good luck! https://github.com/Kidney-Science/install_RTXA4000_Driver_CUDA_cudNN_Ubuntu_22
Your link doesn't work. It takes me to a 404 page.
I've been STRUGGLING with my QEMU KVM VM's that seemingly out of nowhere refused to see my GPU's.
After a few days of fiddling around I eventually figured out a solution.
- Turn off Secure Boot in the VM bios
- Purge all nvidia related stuff
- Install version 520 of the graphics driver
- Install CUDA 11.8
I created a script to make my life easier:
IMPORTANT!
Script:
#!/bin/bash
#install graphics driver 520 specifically
sudo apt purge nvidia* -y
sudo apt autoremove -y && sudo apt autoclean -y
sudo apt update && sudo apt upgrade -y
sudo apt install g++ freeglut3-dev build-essential libx11-dev libxmu dev libxi-dev libglu1-mesa libglu1-mesa-dev
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install libnvidia-common-520 libnvidia-gl-520 nvidia-driver-520 -y
#install cuda 11.8
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
rm cuda-ubuntu2204.pin
rm cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
echo "Install completed! Run: 'sudo reboot now' and then after reboot run 'nvidia-smi' and 'nvtop' to confirm that the GPU is recognized."
- Copy and paste the above into a file called `tryfixgpu.sh
- Then run
sudo chmod +x tryfixgpu.sh
- Then run the script
sudo ./tryfixgpu.sh
Install version 520 of the graphics driver
you sh installing 535
The following NEW packages will be installed:
dctrl-tools dkms libnvidia-cfg1-535 libnvidia-common-520 libnvidia-common-535 libnvidia-decode-535
libnvidia-encode-535 libnvidia-extra-535 libnvidia-fbc1-535 libnvidia-gl-520 libnvidia-gl-535
nvidia-compute-utils-535 nvidia-dkms-535 nvidia-driver-520 nvidia-driver-535 nvidia-firmware-535-535.183.01
nvidia-kernel-common-535 nvidia-kernel-source-535 nvidia-prime nvidia-settings nvidia-utils-535 pkg-config
python3-xkit screen-resolution-extra xserver-xorg-video-nvidia-535
you sh installing 535
Weird. For me it’s installing 520
HI,
I was following this exactly and I got the follow when I got to
sudo apt install cuda-11-8
I think libnvidia-extra-525 is added when running the
install nvidia-driver-515
command. To fix this I had to insert an sudo apt upgrade into the gist.After which running
sudo apt install cuda-11-8
seems to have worked.At the end I get:
But when I run nvidia-smi it shows:
Driver Version: 535.54.03 CUDA Version: 12.2
Not sure why it says CUDA 12.2 instead of 11.8 in nvidia-smi?? Perhaps this is only related to the graphics driver??
in /usr/local I see:
In the /usr/local/cuda/version.json file it lists:
So perhaps running 'apt upgrade' was the wrong thing to do?? Do I need to downgrade my driver to 520.61.05 in order to make all this work correctly??