zhanwenchen/Install NVIDIA Driver and CUDA.md

Forked from wangruohui/Install NVIDIA Driver and CUDA.md

Last active March 13, 2024 23:42

Star (202) You must be signed in to star a gist
Fork (69) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/zhanwenchen/e520767a409325d9961072f666815bb8.js"></script>
Save zhanwenchen/e520767a409325d9961072f666815bb8 to your computer and use it in GitHub Desktop.

Download ZIP

Install NVIDIA CUDA 9.0 on Ubuntu 16.04.4 LTS

Raw

Install NVIDIA Driver and CUDA.md

Updated 4/11/2018

Here's my experience of installing the NVIDIA CUDA kit 9.0 on a fresh install of Ubuntu Desktop 16.04.4 LTS.

Install NVIDIA Graphics Driver via apt-get
Install CUDA
Install cuDNN

Table of contents generated with markdown-toc

1. Install NVIDIA Graphics Driver via apt-get

Do not use the CUDA run file to install your driver. Use apt-get instead. This way you do not need to worry about the Nouveau stuff you read about on StackOverflow.

As of 04/11/2018, the latest version of NVIDIA driver for Ubuntu 16.04.4 LTS is 384. To install the driver, excute

sudo apt-get install nvidia-384 nvidia-modprobe

, and then you will be prompted to disable Secure Boot. Select Disable.

Reboot the machine but enter BIOS to disable Secure Boot. Typically you can enter BIOS by hitting F12 rapidly as soon as the system restarts.

Afterwards, you can check the Installation with the nvidia-smi command, which will report all your CUDA-capable devices in the system.

Common Errors and Solutions

ERROR: Unable to load the 'nvidia-drm' kernel module.

One probable reason is that the system is boot from UEFI but Secure Boot option is turned on in the BIOS setting. Turn it off and the problem will be solved.

Additional Notes

nvidia-smi -pm 1 can enable the persistent mode, which will save some time from loading the driver. It will have significant effect on machines with more than 4 GPUs.

nvidia-smi -e 0 can disable ECC on TESLA products, which will provide about 1/15 more video memory. Reboot is reqired for taking effect. nvidia-smi -e 1 can be used to enable ECC again.

nvidia-smi -pl <some power value> can be used for increasing or decrasing the TDP limit of the GPU. Increasing will encourage higher GPU Boost frequency, but is somehow DANGEROUS and HARMFUL to the GPU. Decreasing will help to same some power, which is useful for machines that does not have enough power supply and will shutdown unintendedly when pull all GPU to their maximum load.

-i <GPUID> can be added after above commands to specify individual GPU.

These commands can be added to /etc/rc.local for excuting at system boot.

2. Install CUDA 9.0

Installing CUDA from runfile is much simpler and smoother than installing the NVIDIA driver. It just involves copying files to system directories and has nothing to do with the system kernel or online compilation. Removing CUDA is simply removing the installation directory. So I personally does not recommend adding NVIDIA's repositories and install CUDA via apt-get or other package managers as it will not reduce the complexity of installation or uninstallation but increase the risk of messing up the configurations for repositories.

The CUDA runfile installer can be downloaded from NVIDIA's websie, or using wget in case you can't find it easily on NVIDIA:

cd
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run

What you download is a package the following three components:

an NVIDIA driver installer, but usually of stale version;
the actual CUDA installer;
the CUDA samples installer;

I suggest extracting the above three components and executing 2 and 3 separately (remember we installed the driver ourselves already). To extract them, execute the runfile installer with --extract option:

cd
chmod +x cuda_9.0.176_384.81_linux-run
./cuda_9.0.176_384.81_linux-run --extract=$HOME

You should have unpacked three components: NVIDIA-Linux-x86_64-384.81.run (1. NVIDIA driver that we ignore), cuda-linux.9.0.176-22781540.run (2. CUDA 9.0 installer), and cuda-samples.9.0.176-22781540-linux.run (3. CUDA 9.0 Samples).

Execute the second one to install the CUDA Toolkit 9.0:

sudo ./cuda-linux.9.0.176-22781540.run

You now have to accept the license by scrolling down to the bottom (hit the "d" key on your keyboard) and enter "accept". Next accept the defaults.

To verify our CUDA installation, install the sample tests by

sudo ./cuda-samples.9.0.176-22781540-linux.run

After the installation finishes, configure the runtime library.

sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig

It is also recommended for Ubuntu users to append string /usr/local/cuda/bin to system file /etc/environments so that nvcc will be included in $PATH. This will take effect after reboot. To do that, you just have to

sudo vim /etc/environments

and then add :/usr/local/cuda/bin (including the ":") at the end of the PATH="/blah:/blah/blah" string (inside the quotes).

After a reboot, let's test our installation by making and invoking our tests:

cd /usr/local/cuda-9.0/samples
sudo make

It's a long process with many irrelevant warnings about deprecated architectures (sm_20 and such ancient GPUs). After it completes, run deviceQuery and p2pBandwidthLatencyTest:

cd /usr/local/cuda/samples/bin/x86_64/linux/release
./deviceQuery

The result of running deviceQuery should look something like this:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 6073 MBytes (6367739904 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1671 MHz (1.67 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

Cleanup: if ./deviceQuery works, remember to rm the 4 files (1 downloaded and 3 extracted).

Install cuDNN 7.0

The recommended way for installing cuDNN is to

Download the "cuDNN v7.0.5 Library for Linux" tgz file (need to register for an Nvidia account).
sudo mv the downloaded archive to /usr/local. This might seem silly at first, but when you unzip it next you will see that the contents end up going to various folders under /usr/local/cuda and would be messy to move otherwise.
Then cd /usr/local and extract the tgz by

sudo tar -xvzf cudnn-9.0-linux-x64-v7.tgz

Finally, execute sudo ldconfig to update the shared library cache.
Clean up now or later by sudo rm cudnn-9.0-linux-x64-v7.tgz

rabinnh commented May 31, 2018

To speed up compilation (tremendously) when building the samples use multiple threads:
make -j 6
-j 99 will use all available threads on the machine.

lkshrsch commented Jun 6, 2018

Worked perfectly

mohak1 commented Jun 10, 2018

Thank you so much!
I was finally able to install CUDA on my machine.

gjxdxh commented Jun 12, 2018

When I do sudo make in the cuda samples, it shows the following errors:

/usr/bin/ld: cannot find -lglut
collect2: error: ld returned 1 exit status
Makefile:293: recipe for target 'simpleTexture3D' failed
make[1]: *** [simpleTexture3D] Error 1
make[1]: Leaving directory '/usr/local/cuda-9.0/samples/2_Graphics/simpleTexture3D'
Makefile:52: recipe for target '2_Graphics/simpleTexture3D/Makefile.ph_build' failed
make: *** [2_Graphics/simpleTexture3D/Makefile.ph_build] Error 2

Do you know how can I solve it? Thank you in advance

frozflame commented Jun 19, 2018

Minor error: sudo apt-get install nvidia-384 nvidia-modprobe.
You omitted the word install.

kunalchelani commented Jun 22, 2018

@gjxdxh. I guess you do not have glut installed. You can install it using sudo apt-get install freeglut3-dev and retry making the samples.

jcolares commented Jun 22, 2018

Best tutorial ever! Thanks!

npatel37 commented Jul 16, 2018

Worked flawlessly!
I don't know why other guides are not as clear.
Thank you very very much!

Tixierae commented Jul 23, 2018

Thank you so much for sharing, this saved me lots of time!

anik-jha commented Jul 28, 2018 •

edited

Loading

This is a life saver!!
Now I am not afraid to install cuda ever again :)

Although I found the following working for me to install cudnn (from official website):
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h
/usr/local/cuda/lib64/libcudnn*

ghost commented Aug 3, 2018

Can you please give a suggestion on how to apply cuda patches that are given alongside
cuda runfile installer on NVIDIA CUDA website

srinivasbakki commented Aug 7, 2018

After following those steps i still get the error:
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

can you please suggest ?

ludennis commented Sep 11, 2018

Great guide!!!

Minor error as mentioned by @shizukanaskytree still need to be corrected, though.

Thanks,

Dennis

Gneoma commented Sep 23, 2018

This is a nice description of the loading process that has apparently helped many people, but not me. On the reboot after the install of the graphics driver using the command you suggest, I have the infinite login-loop problem, even though I have set up my fresh install of Ubuntu 16.04 with secure login disabled and made sure that the secure login was disabled in the BIOS. I have been at this for several weeks now and have tried a number of different approaches proposed in other discussions with the same result. No matter what approach I have tried, once I install a Nividia driver or cuda I have the login-loop problem. All the simple fixes that have been proposed, such as resetting permissions or ownership of .Xauthority and other files do not fix problem. However, once I purge the Nvidia driver, the login-loop problem goes away. I have tried to install Nvidia drivers on several vintages of 16.04 and 14.04 with the same result. I am doing this on a Supermicro workstation that was specifically designed to support GPUs. I am puzzled to say the least.

flydagger commented Sep 28, 2018

Many thanks. This guide works on my computer.

KittyLiou commented Oct 6, 2018

@gjxdxh I got a similar error too.
Here's the error message:

/usr/bin/ld: cannot find -lglut
collect2: error: ld returned 1 exit status
Makefile:296: recipe for target 'Mandelbrot' failed
make[1]: *** [Mandelbrot] Error 1
make[1]: Leaving directory '/usr/local/cuda-9.0/samples/2_Graphics/Mandelbrot'
Makefile:52: recipe for target '2_Graphics/Mandelbrot/Makefile.ph_build' failed
make: *** [2_Graphics/Mandelbrot/Makefile.ph_build] Error 2

According to the NVIDIA CUDA installation guide,

Some CUDA samples use third-party libraries which may not be installed by default
on your system. These samples attempt to detect any required libraries when building.
If a library is not detected, it waives itself and warns you which library is missing.

we just need to install those third-party libraries to make the CUDA samples run correctly.

For Ubuntu, we just need to install the below:
sudo apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev

Everything then works fine.

david90103 commented Oct 20, 2018

Thank you!!!
This is the only working way for me.

ThejanW commented Oct 26, 2018

This helped..thanks 💯

bibin-sebastian commented Oct 30, 2018

Thank you.. had to use this to set up a gcp vm

VhaijaiyanthishreeVenkataramanan commented Nov 11, 2018

One of the best guide for CUDA installation!! I've been struggling for the last two days and this for sure came to my rescue!! A million thanks to the author !! ^_^

HareshKarnan commented Dec 21, 2018

Thanks ! It worked on my GTX 960M

curtcorum commented Feb 15, 2019 •

edited

Loading

Yes, thank you!
Trouble free for equivalent installation of cuda 9.1 using ppa for nvidia-387

xjcl commented Mar 5, 2019

Lifesaver! Big thanks!

joellerena commented Mar 24, 2019 •

edited

Loading

Thank you very much, everything worked with the corrections:

sudo apt-get nvidia-384 nvidia-modprobe <- by: add install ->
sudo apt-get install nvidia-384 nvidia-modprobe

sudo vim /etc/environments <- by: remove 's' ->
sudo vim /etc/environment

I used:
sudo nano /etc/environment

I would join him for the cuDNN, the explanation of
[https://peshmerge.io/how-to-install-cuda-9-0-on-with-cudnn-7-1-4-on-ubuntu-18-04/]
with access to [https://developer.nvidia.com/rdp/cudnn-download] for cudnn 7.5

..... deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
Cuda compilation tools, release 9.0, V9.0.176
cudnnGetVersion() : 7500 , CUDNN_VERSION from cudnn.h : 7500 (7.5.0) ......
Test passed!

Thank you very much for the good tutorial.

jman278 commented Sep 15, 2019

After following those steps i still get the error:
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

can you please suggest ?

Were you able to fix this?

zhanwenchen commented Sep 15, 2019

Author

Thank you everyone for your suggestions. I never got notified of any comment until today.

zhanwenchen commented Sep 15, 2019

Author

@jman278 What's the version of your CUDA (nvcc --version) and your current driver version (nvidia-smi)?

jesseorg commented Oct 17, 2019

Worked on a new install of Ubuntu 16.04.4 LTS. Thank you for making this a most painless and error free instruction set. Just missing a few commands for those who don't know vim.

s-niket commented Dec 10, 2019

Thank you so much, this works perfectly fine.

typoworx-de commented Jan 21, 2021

Thanks! Made my day

zhanwenchen/Install NVIDIA Driver and CUDA.md

Table of Contents

1. Install NVIDIA Graphics Driver via apt-get

Common Errors and Solutions

Additional Notes

2. Install CUDA 9.0

Install cuDNN 7.0

rabinnh commented May 31, 2018

Uh oh!

lkshrsch commented Jun 6, 2018

Uh oh!

mohak1 commented Jun 10, 2018

Uh oh!

gjxdxh commented Jun 12, 2018

Uh oh!

frozflame commented Jun 19, 2018

Uh oh!

kunalchelani commented Jun 22, 2018

Uh oh!

jcolares commented Jun 22, 2018

Uh oh!

npatel37 commented Jul 16, 2018

Uh oh!

Tixierae commented Jul 23, 2018

Uh oh!

anik-jha commented Jul 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Aug 3, 2018

Uh oh!

srinivasbakki commented Aug 7, 2018

Uh oh!

ludennis commented Sep 11, 2018

Uh oh!

Gneoma commented Sep 23, 2018

Uh oh!

flydagger commented Sep 28, 2018

Uh oh!

KittyLiou commented Oct 6, 2018

Uh oh!

david90103 commented Oct 20, 2018

Uh oh!

ThejanW commented Oct 26, 2018

Uh oh!

bibin-sebastian commented Oct 30, 2018

Uh oh!

VhaijaiyanthishreeVenkataramanan commented Nov 11, 2018

Uh oh!

HareshKarnan commented Dec 21, 2018

Uh oh!

curtcorum commented Feb 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xjcl commented Mar 5, 2019

Uh oh!

joellerena commented Mar 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jman278 commented Sep 15, 2019

Uh oh!

zhanwenchen commented Sep 15, 2019

Uh oh!

zhanwenchen commented Sep 15, 2019

Uh oh!

jesseorg commented Oct 17, 2019

Uh oh!

s-niket commented Dec 10, 2019

Uh oh!

typoworx-de commented Jan 21, 2021

Uh oh!

anik-jha commented Jul 28, 2018 •

edited

Loading

curtcorum commented Feb 15, 2019 •

edited

Loading

joellerena commented Mar 24, 2019 •

edited

Loading