Skip to content

Instantly share code, notes, and snippets.

@tranctan
Last active August 13, 2024 08:28
Show Gist options
  • Select an option

  • Save tranctan/7136955aaf2a1457301b68ed2b2ea4d4 to your computer and use it in GitHub Desktop.

Select an option

Save tranctan/7136955aaf2a1457301b68ed2b2ea4d4 to your computer and use it in GitHub Desktop.
How to install CUDA and run Pytorch on Linux

How to install CUDA and run Pytorch on Linux

In order to run Pytorch with CUDA sucesfully, we need to install 2 things:

  • The CUDA Toolkit from NVIDIA, which includes the CUDA drivers and NVIDIA drivers
  • The CUDA Toolkit from Conda, comes along with Pytorch, to make pytorch interacts with CUDA driver

Check the version of CUDA

  • Check the current CUDA version that is supported by Pytorch. In the image below we can see that currently Pytorch supports CUDA 11.3
  • image
  • Next, check if this CUDA version is supported for your GPU. Check the GPUs supported session in this CUDA wikiperdia.

Pre-installation

We need to perform certain steps to make sure the setting of CUDA Toolkit performs without error. From the CUDA instruction page:

  • Verify You Have a CUDA-Capable GPU

    • To verify that your GPU is CUDA-capable, go to your distribution's equivalent of System Properties, or, from the command line, enter: lspci | grep -i nvidia
    • If you do not see any settings, update the PCI hardware database that Linux maintains by entering update-pciids (generally found in /sbin) at the command line and rerun the previous lspci command.
    • If your graphics card is from NVIDIA and it is listed in https://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable.
  • Verify You Have a Supported Version of Linux

    • The CUDA Development Tools are only supported on some specific distributions of Linux. These are listed in the CUDA Toolkit release notes.
    • To determine which distribution and release number you're running, type the following at the command line: uname -m && cat /etc/*release
    • You should see output similar to the following, modified for your particular system:
x86_64
Red Hat Enterprise Linux Workstation release 6.0 (Santiago)
The x86_64 line indicates you are running on a 64-bit system. The remainder gives information about your distribution.
  • Verify the System Has gcc Installed

    • The gcc compiler is required for development using the CUDA Toolkit. It is not required for running CUDA applications.
    • To verify the version of gcc installed on your system, type the following on the command line: gcc --version
    • If an error message displays, you need to install the development tools from your Linux distribution or obtain a version of gcc and its accompanying toolchain from the Web.
  • Verify the System has the Correct Kernel Headers and Development Packages Installed

    • The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed.
    • The version of the kernel your system is running can be found by running the following command: uname -r
    • For Ubuntu and Debian, install the Kernel Header and development packages for the currently running kernel can be installed with: sudo apt-get install linux-headers-$(uname -r)

Install the CUDA toolkit on linux

After performing all the pre-installation steps, now it's time to install the CUDA Toolkit from NVIDIA

  • Pick the correct CUDA version and install it via the page of NVIDIA. E.g. NVIDIA Cuda Toolkit 11.3

  • Select the OS and the way you want to install: Currently there are 3 methods to install CUDA: The .deb local file, install via the network and the .run runfile, the most popular one is .deb local file, but the easiest one is the .run runfile. I recommend to install via .run runfile.

  • Next, follow the commands of the method you choose, E.g. for .run file

    • wget https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda_11.3.0_465.19.01_linux.run
    • sudo sh cuda_11.3.0_465.19.01_linux.run
  • Then sudo reboot

  • Perform the post-installation steps:

    • The PATH variable needs to include export PATH=/usr/local/cuda-11.6/bin${PATH:+:${PATH}}. Nsight Compute has moved to /opt/nvidia/nsight-compute/ only in rpm/deb installation method. When using .run installer it is still located under /usr/local/cuda-11.6/.
    • To add this path to the PATH variable: export PATH=/usr/local/cuda-11.6/bin${PATH:+:${PATH}}
  • In addition, when using the runfile installation method, the LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-11.6/lib64 on a 64-bit system, or /usr/local/cuda-11.6/lib on a 32-bit system

    • To change the environment variables for 64-bit operating systems:
      • export LD_LIBRARY_PATH=/usr/local/cuda-11.6/lib64 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    • To change the environment variables for 32-bit operating systems:
      • export LD_LIBRARY_PATH=/usr/local/cuda-11.6/lib ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} Note that the above paths change when using a custom install path with the runfile installation method.

Install the Pytorch along with cudatoolkit in Conda

Select the appropriate version of CUDA and install using the commands from the homepage of Pytorch as below image

Check if CUDA is installed sucessfully

Run the following commands:

  • nvcc --version
  • nvidia-smi

To check that Pytorch can recognize CUDA, run the following python codes:

import torch

print(torch.cuda.is_available())
>>> True

Extra: How to uninstall CUDA and NVIDIA drivers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment