Skip to content

Instantly share code, notes, and snippets.

@wangruohui
Last active September 15, 2024 18:49
Show Gist options
  • Save wangruohui/df039f0dc434d6486f5d4d098aa52d07 to your computer and use it in GitHub Desktop.
Save wangruohui/df039f0dc434d6486f5d4d098aa52d07 to your computer and use it in GitHub Desktop.
Install NVIDIA Driver and CUDA on Ubuntu / CentOS / Fedora Linux OS

In this article, I will share some of my experience on installing NVIDIA driver and CUDA on Linux OS. Here I mainly use Ubuntu as example. Comments for CentOS/Fedora are also provided as much as I can.

Table of Contents

Table of contents generated with markdown-toc

Install NVIDIA Graphics Driver via apt-get

In Ubuntu systems, drivers for NVIDIA Graphics Cards are already provided in the official repository. Installation is as simple as one command.

For ubuntu 14.04.5 LTS, the latest version is 352. To install the driver, excute sudo apt-get nvidia-352 nvidia-modprobe, and then reboot the machine.

For ubuntu 16.04.3 LTS, the latest version is 375. To install the driver, excute sudo apt-get nvidia-375 nvidia-modprobe, and then reboot the machine.

The nvidia-modprobe utility is used to load NVIDIA kernel modules and create NVIDIA character device files automatically everytime your machine boots up.

It is recommended for new users to install the driver via this way because it is simple. However, it has some drawbacks:

  1. The driver included in official Ubuntu repository is usually not the latest.
  2. There would be some naming conflicts when other repositories (e.g. ones from CUDA) are added to the system.
  3. One has to reinstall the driver after Linux kernel are updated.

Install NVIDIA Graphics Driver via runfile

For advanced user who wants to get the latest version of the driver, get rid of the reinstallation issue caused bby dkms, or using Linux distributions that do not have nvidia drivers provided in the repositories, installing from runfile is recommended.

Remove Previous Installations (Important)

One might have installed the driver via apt-get. So before reinstall the driver from runfile, uninstalling previous installations is required. Executing the following scripts carefully one by one.

sudo apt-get purge nvidia*

# Note this might remove your cuda installation as well
sudo apt-get autoremove 

# Recommended if .deb files from NVIDIA were installed
# Change 1404 to the exact system version or use tab autocompletion
# After executing this file, /etc/apt/sources.list.d should contain no files related to nvidia or cuda
sudo dpkg -P cuda-repo-ubuntu1404

Download the Driver

The latest NVIDIA driver for Linux OS can be fetched from NVIDIA's official website. The first one in the list, i.e. Latest Long Lived Branch version for Linux x86_64/AMD64/EM64T, is suitable for most case.

If you want to down load the driver directly in a Linux shell, the script below would be useful.

cd ~
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/384.69/NVIDIA-Linux-x86_64-384.69.run

Detailed installation instruction can be found in the download page via a README hyperlink in the ADDITIONAL INFORMATION tab. I have also summarized key steps below.

Install Dependencies

Software required for the runfile are officially listed here. But this page seems to be stale and not easy to follow.

For Ubuntu, installing the following dependencies is enough.

  1. build-essential -- For building the driver
  2. (Optional) gcc-multilib -- For providing 32-bit support
  3. dkms -- For providing dkms support
  4. (Optional) xorg and xorg-dev. On a workstation with GUI, this is require but usually have already been installed, because you have already got the graphic display. On headless servers without GUI, this is not a must.

As a summary, excuting sudo apt-get install build-essential gcc-multilib dkms to install all dependencies.

Required packages for CentOS are epel-release dkms libstdc++.i686. Execute yum install epel-release dkms libstdc++.i686.

Required packages for Fedora are dkms libstdc++.i686 kernel-devel. Execute dnf install dkms libstdc++.i686 kernel-devel.

Creat Blacklist for Nouveau Driver

Create a file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents:

blacklist nouveau
options nouveau modeset=0

Note: It is also possible for the NVIDIA installation runfile to creat this blacklist file automatically. Excute the runfile and follow instructions when an error realted Nouveau appears.

Then,

  1. for Ubuntu 14.04 LTS, reboot the computer;
  2. for Ubuntu 16.04 LTS, excute sudo update-initramfs -u and reboot the computer;
  3. for CentOS/Fedora, excute sudo dracut --force and reboot the computer.

Stop lightdm/gdm/kdm

After the computer is rebooted. We need to stop the desktop manager before excuting the runfile to install the driver. lightdm is the default desktop manager in Ubuntu. If GNOME or KDE desktop environment is used, installed desktop manager will then be gdm or kdm.

  1. For Ubuntu 14.04 / 16.04, excuting sudo service lightdm stop (or use gdm or kdm instead of lightdm)
  2. For Ubuntu 16.04 / Fedora / CentOS, excuting sudo systemctl stop lightdm (or use gdm or kdm instead of lightdm)

Excuting the Runfile

After above batch of preparition, we can eventually start excuting the runfile. So this is why I, from the very begining, recommend new users to install the driver via apt-get.

cd ~
chmod +x NVIDIA-Linux-x86_64-384.69.run
sudo ./NVIDIA-Linux-x86_64-384.69.run --dkms -s

Note:

  1. option --dkms is used for register dkms module into the kernel so that update of the kernel will not require a reinstallation of the driver. This option should be turned on by default.
  2. option -s is used for silent installation which should used for batch installation. For installation on a single computer, this option should be turned off for more installtion information.
  3. option --no-opengl-files can also be added if non-NVIDIA (AMD or Intel) graphics are used for display while NVIDIA graphics are used for display.
  4. The installer may prompt warning on a system without X.Org installed. It is safe to ignore that based on my experience.
WARNING: nvidia-installer was forced to guess the X library path '/usr/lib' and X module path '/usr/lib/xorg/modules'; these paths were not queryable from the system.  If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.

Check the Installation

After a succesful installation, nvidia-smi command will report all your CUDA-capable devices in the system.

Common Errors and Solutions

  1. ERROR: Unable to load the 'nvidia-drm' kernel module.
  • One probable reason is that the system is boot from UEFI but Secure Boot option is turned on in the BIOS setting. Turn it off and the problem will be solved.

Additional Notes

nvidia-smi -pm 1 can enable the persistent mode, which will save some time from loading the driver. It will have significant effect on machines with more than 4 GPUs.

nvidia-smi -e 0 can disable ECC on TESLA products, which will provide about 1/15 more video memory. Reboot is reqired for taking effect. nvidia-smi -e 1 can be used to enable ECC again.

nvidia-smi -pl <some power value> can be used for increasing or decrasing the TDP limit of the GPU. Increasing will encourage higher GPU Boost frequency, but is somehow DANGEROUS and HARMFUL to the GPU. Decreasing will help to same some power, which is useful for machines that does not have enough power supply and will shutdown unintendedly when pull all GPU to their maximum load.

-i <GPUID> can be added after above commands to specify individual GPU.

These commands can be added to /etc/rc.local for excuting at system boot.

Install CUDA

Installing CUDA from runfile is much simpler and smoother than installing the NVIDIA driver. It just involves copying files to system directories and has nothing to do with the system kernel or online compilation. Removing CUDA is simply removing the installation directory. So I personally does not recommend adding NVIDIA's repositories and install CUDA via apt-get or other package managers as it will not reduce the complexity of installation or uninstallation but increase the risk of messing up the configurations for repositories.

The CUDA runfile installer can be downloaded from NVIDIA's websie. But what you download is a package the following three components:

  1. an NVIDIA driver installer, but usually of stale version;
  2. the actual CUDA installer;
  3. the CUDA samples installer;

To extract above three components, one can execute the runfile installer with --extract option. Then, executing the second one will finish the CUDA installation. Installation of the samples are also recommended because useful tool such as deviceQuery and p2pBandwidthLatencyTest are provided.

Scripts for installing CUDA Toolkit are summarized below.

cd ~
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.run
chmod +x cuda_7.5.18_linux.run
./cuda_7.5.18_linux.run --extract=$HOME
sudo ./cuda-linux64-rel-7.5.18-19867135.run

After the installation finishes, configure runtime library.

sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig

It is also recommended for Ubuntu users to append string /usr/local/cuda/bin to system file /etc/environments so that nvcc will be included in $PATH. This will take effect after reboot.

Install cuDNN

The recommended way for installing cuDNN is to first copy the tgz file to /usr/local and then extract it, and then remove the tgz file if necessary. This method will preserve symbolic links. At last, execute sudo ldconfig to update the shared library cache.

@uzl
Copy link

uzl commented Jun 3, 2019

For Ubuntu 18.04 LTS
https://gist.github.com/uzl/f0a2f0b126cf0b569de370c3db5e95d4

Also solution for error during Building module or Compiler Mismatch problem

@aninternetian
Copy link

I only needed CUDA on 18.04 and I got a Can't locate InstallUtils.pm error in the last line
In case anyone encountered this as well, this solution worked for me:
https://devtalk.nvidia.com/default/topic/983777/cuda-setup-and-installation/can-t-locate-installutils-pm-in-inc/post/5048816/#5048816

@thirdeye-yusuf
Copy link

Fantastic guide, got me out of a pickle :)

@CharlesB2
Copy link

On Ubuntu, installing the NVIDIA graphics driver, you may have the following error in the install log:

Unable to load: nvidia-installer ncurses v6 user interface

My fix for it was to simply rm /usr/lib/nvidia/pre-install (not a great loss, as the script is just doing exit 1)

@kaaviyave
Copy link

Well explained! Worked like a charm after two days of problems with booting to a blank screen. Thanks a lot.

@avyas
Copy link

avyas commented Sep 8, 2019

CUDA 10.1 doesn't contain "the actual CUDA installer" cuda installer can be used with options: --samples --toolkit

@hash2430
Copy link

So far, this is the best explanation that I have ever seen. Thanks a lot!

@stiv-yakovenko
Copy link

Can you cover how to remove nvidia modules from kernel?

@Guitaricet
Copy link

I had some issues but this guide saved me an hour or so. Thank you!

@Amused-Goose
Copy link

I only needed to install cuda which also installed the driver (via the .run installer). Installing the driver first gave issues and the driver included with the cuda install was a bit older.
Thanks for this guide, installing the driver was never easier!

@XM-WANG
Copy link

XM-WANG commented May 20, 2020

Thanks for your amazing instruction! You know my computer was broken last week when I first came back to the company. IT staff assign me a new computer and therefore all of my configurations were changed. And I need to rebuild the whole environment including Nvidia driver, CUDA, and PyTorch. In this process, I noticed that the version of my driver was too old, which confused me a lot. But I suddenly reminded that my greatest leader has a perfect driver installation instruction on the gist. So I began to follow your instruction and updated my driver. That works well!

I couldn't remember how many times I used to update graphic drivers according to other instructions which always caused the graphic interface lost. This instruction is the best one I have ever met! I wish it could be known to all of the people who are troubled by the problems of graphic driver updating.

@mdgonzales1998
Copy link

I got through all of the steps through the restart. Once I tried restarting and going into the BIO to make sure secure boot was disabled, I ran into an issue. My boot menus is a black screen with the mouse pointer that if I try moving it, it will move to the left of the screen and go down until I can't see it anymore. I tried proceeding without disabling secure boot as my machine doesn't have windows loaded, only Ubuntu 16.04. I see the Ubuntu load screen, then it goes to a black screen stuck with the text

"/dev/sda2: clean, 962340/60956672 files, 14739229/243809024 blocks
[ 27.018846] rc.local[1481]: 0"

Any help or guidance to helpful forums would be extremely helpful, thanks!

@yuqli
Copy link

yuqli commented Oct 1, 2020

Thanks! Particularly the first point. Manually update a driver, when there is already one driver installed, is a pain. Use apt-get is much easier! See this link to find available drivers on Ubuntu : https://www.cyberciti.biz/faq/ubuntu-linux-install-nvidia-driver-latest-proprietary-driver/

@KasperSkytte
Copy link

Incredibly, this is still relevant in 2021!

@sidneydemoraes
Copy link

This is indeed the most incredible tutorial regarding NVidia drivers.
I got only one question: what if I need to uninstall all that? How to uninstall drivers installed through RUN files?

@sidneydemoraes
Copy link

sidneydemoraes commented Apr 26, 2021

Just found the answer to my own question: just run the same RUN file with the --uninstall option.

Besides, it worths mentioning that the new RUN file are much better now. We don't need to use the --extract option anymore. The installer already offers a menu with all installation options, including all three packages.

@buxizhizhoum
Copy link

Thanks

@laudai
Copy link

laudai commented Aug 18, 2021

Is that typo for "by" in sentence in "get rid of the reinstallation issue caused bby dkms," ?

@p0mad
Copy link

p0mad commented Jul 1, 2023

Can you please provide more details about the cudnn?
I've extracted and copied the files to /usr/local/cuda-11-2/include and lib64
but i'm not sure how to test and veryfy the cudnn 8.1.0.77 installations!
is that just copying the files enough for cudnn?

P.S. When i extracted the tar file of cudnn, it creates a folder named "cuda" So, i didn't extract the tar file in /usr/local because there was already available folder named cuda! also i'm using Conda ENVironmnets

Thanks
Best regards

@wangruohui
Copy link
Author

Can you please provide more details about the cudnn? I've extracted and copied the files to /usr/local/cuda-11-2/include and lib64 but i'm not sure how to test and veryfy the cudnn 8.1.0.77 installations! is that just copying the files enough for cudnn?

P.S. When i extracted the tar file of cudnn, it creates a folder named "cuda" So, i didn't extract the tar file in /usr/local because there was already available folder named cuda! also i'm using Conda ENVironmnets

Thanks Best regards

Hi, this post is somehow old. New version of CuDNN changes its folder structure. The key point is to extract and place header files and lib files into corresponding folders of CUDA, so that compilers and linkers can found them. A hint is to use cp -a to keep symbolic links when copying library files.

@p0mad
Copy link

p0mad commented Jul 6, 2023

@wangruohui
Thanks for the reply!
I was wondering if i can install cuda and cudnn [Lets say v11.2 , 8.0.1.77 respectively] in my base system ( ubuntu 20.04 ) and then use conda environment to create a virtual python ENV ( using $conda create test python=3.10 ). So now can i have a different version of cuda and cudnn [Lets say V12.2, 8.9.2 respectively] for my new ENV?? If YES please guide me through that OTHERWISE please give me some hints what are some other ways to have near baremetal experience to achieve this using other Virtualizations?

Thanks
Best regards

@wangruohui
Copy link
Author

@wangruohui Thanks for the reply! I was wondering if i can install cuda and cudnn [Lets say v11.2 , 8.0.1.77 respectively] in my base system ( ubuntu 20.04 ) and then use conda environment to create a virtual python ENV ( using $conda create test python=3.10 ). So now can i have a different version of cuda and cudnn [Lets say V12.2, 8.9.2 respectively] for my new ENV?? If YES please guide me through that OTHERWISE please give me some hints what are some other ways to have near baremetal experience to achieve this using other Virtualizations?

Thanks Best regards

I am not very clear about the details, but it should be possible by configuring some sort of "profile script" (i don't know the exact name) of conda, so that when you activate your environment, the script is executed and environment variables such as CUDA_HOME (or PATH, INCLUDE_PATH, LIBRARY_PATH and LD_LIBRARY_PATH) to your new version of cuda and cudnn is set. So that you can make use of different version of cuda in different conda environments.

@BwandoWando
Copy link

BwandoWando commented Aug 28, 2023

Hello

I just want to let you know that this guide helped me, I am a novice in Linux and I've only been using Ubuntu for a few years and it's always been nerve-wracking for me to update drivers and configurations. For some context, I've built a personal deep learning machine not long ago , and coming from a Windows user, it was painful for me to configure a brand new system but was able to

Just a week ago, I was trying to setup in my local some LLM environments in my local and I installed CUDA 11.8 on top of my existing nvidia installation and that's when i was notified that i have a "MANUALLY INSTALLED DRIVER" and I cant update it using terminal because of "held packages" , I know i did something wrong and the best way is to do everything from scratch.

This guide made me

  1. clean all cuda and nvidia driver traces
  2. install the necessary dependencies to build #3 below
  3. install the nvidia, cuda, and cudnn libraries
  4. setup configs, paths, and how to verify them

Though I encountered a minor issue when trying to invoke nvcc --version, i may have overlooked your guide, but here's what I used on top your guide. https://askubuntu.com/a/885627/1657364

But still, amazing readme and thank you very much

@ny2292000
Copy link

ny2292000 commented Aug 28, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment