Skip to content

Instantly share code, notes, and snippets.

@mantasu
Last active November 14, 2024 05:44
Show Gist options
  • Save mantasu/d79d23b58d822d675274f87c46eb7aca to your computer and use it in GitHub Desktop.
Save mantasu/d79d23b58d822d675274f87c46eb7aca to your computer and use it in GitHub Desktop.
Tensorflow & Pytorch installation with CUDA (Linux and WSL2 for Windows 11)

Install Tensorflow & Pytorch with CUDA [Linux | WSL2]

Overview

This guide provides steps on how to install Tensorflow and Pytorch on Linux environment (including WSL2 - Windows Subsystem for Linux) with NVIDIA GPU support. Here I focus on Ubuntu 22.04 and WSL2 (Windows 11) but things should work on more/less recent/relevant versions. From what I've checked there are no full consistent guidelines, hopefully this one should clear things up (also serve as a reminder for me).

To install purely on Windows 10/11 (no WSL), I suggest to follow this tutorial.

GPU Setup

NVIDIA Driver

Please install the newest NVIDIA drivers. There are plenty of tutorials on how to do this, here are some examples:

  • Linux - guide by Shahriar Shovon on how to install & uninstall NVIDIA drivers on Ubuntu 22.04 LTS
  • Windows 11 (WSL2 users) - use GeForce Experience for automatic updates or update manually by following the official guide

Cuda

CUDA Toolkit can be installed by following the official guides. CUDA is backwards compatible with previous versions so please install the newest version.

  • Linux - install deb/rpm (preferably network version) from NVIDIA page. If needed, check the guide
  • WSL2 - install WSL2 on Windows 11 by following this guide, then CUDA Toolkit by following the subsequent guide

Notice for WSL2 users: as mentioned in the official Ubuntu guide, "the CUDA driver used is part of the Windows driver installed on the system" so make sure to follow those steps since installation is not the same as on a separate Linux system.

CuDNN

From here the installation is the same for both Linux and WSL2 users. All there is to do is, again, to follow the official guide. To keep things simple:

  1. Ensure you're registered for the NVIDIA Developer Program
  2. Install Zlib as specified here
  3. Download deb*/rpm for newest CUDA here (complete survey and accept terms)
  4. Install the downloaded file as specified here (don't install libcudnn8-samples)

*For Debian releases, check the architecture type (to download the correct deb file):

$ dpkg --print-architecture

Path Setup & Version Management

Add the following lines to your ~/.profile or ~/.bash_profile file (alternatively, ~/.bashrc also works):

# Add header locations to path variables
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/include

# UNCOMMENT if you use WSL2
# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/wsl/lib

Note that /usr/local/ should contain the newest CUDA directory, e.g., currently, cuda-11.7. There may also be /usr/local/cuda-11 and /usr/local/cuda which are simply shortcuts to the newest cuda-X.Y directory. cuda is chosen to be used as export path because, if there are any version changes, /usr/local/cuda should point to the selected one.

You can switch between CUDA and CuDNN versions (if they were installed from deb/rpm) with the following commands:

  • $ sudo update-alternatives --config cuda # switch CUDA version
  • $ sudo update-alternatives --config libcudnn # switch cuDNN version

Reboot as a final step for GPU Setup

Package & Library Setup

Conda

The installation for Anaconda is as follows (official guide):

  1. Download anaconda installer for your Linux distribution
  2. Run the installer (replace 2022.05 and x86_64 with the downloaded version)
    $ bash Anaconda-2022.05-Linux-x86_64.sh # type `yes` at the end to init conda
  3. You can disable automatic activation of the base environment:
    $ conda config --set auto_activate_base false

In case you want to remove it later, just remove the entire directory:

$ rm -rf $CONDA_PREFIX # ensure conda environment is deactivated when running

You don't have to install Anaconda as you can simply create environments for every project with virtual environment, however Anaconda or Miniconda makes environments easier to manage and alleviates some issues with GPU setup if there are any.

Tensorflow

Create new CONDA environment and install Tensorflow (CUDA 11.7 or newer should be backwards compatible):

$ conda create -n tf-ws python=3.10 # currently Python 3.10.4 is the newest
$ conda activate tf-ws
$ pip install tensorflow

To test if Tensorflow supports GPU:

$ python
>>> import tensorflow
>>> tensorflow.test.is_gpu_available()
...
True
>>> exit()
$ conda deactivate

For WSL2 users: there should be warnings about NUMA support but they can be ignored as it is a small side effect of WSL2.

Pytorch

Create new CONDA environment and install Pytorch (CUDA 11.7 or newer should be backwards compatible):

$ conda create -n torch-ws python=3.10 # currently Python 3.10.4 is the newest
$ conda activate torch-ws
$ pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
$ pip install [pytorch-lightning | pytorch-ignite] # choose either (optional)

To test if Pytorch supports GPU:

$ python
>>> import torch
>>> torch.cuda.is_available()
True
>>> exit()
$ conda deactivate
@relic-yuexi
Copy link

should the cudnn version and cuda version fit with tensorflow ?

 W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

@mantasu
Copy link
Author

mantasu commented Apr 11, 2024

For me, the newest versions of CUDA and CuDNN have always worked. For TensorFlow, you may want to double-check the official guide, e.g., right now to install tensorflow with GPU support one has to run:

pip install tensorflow[and-cuda]

@relic-yuexi
Copy link

For me, the newest versions of CUDA and CuDNN have always worked. For TensorFlow, you may want to double-check the official guide, e.g., right now to install tensorflow with GPU support one has to run:

pip install tensorflow[and-cuda]

yes, i use the same command. But tf can't check the gpu. The pytorch is all fine.

@mantasu
Copy link
Author

mantasu commented Apr 12, 2024

There could be issues with package paths not being detected. Make sure you've made the packages visible in ~/.bashrc:

# Add header locations to path variables
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/include

# UNCOMMENT if you use WSL2
# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/wsl/lib

Further, you could check for a solution for the same error in 34287. Or tensorflow/issues should also contain similar topics since it is a very common problem people face.

@dilberiqbal
Copy link

i have run this code "import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
" but following message is displayed " []
2024-04-17 11:01:21.100389: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-17 11:01:21.354640: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices..." how to solve this issue?

@mantasu
Copy link
Author

mantasu commented Apr 18, 2024

@dilberiqbal If you're using WSL2, you have to uncomment this:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/wsl/lib

Alternatively, maybe a thread on NVIDIA could help you solve this.

@oxygenkun
Copy link

oxygenkun commented Jun 4, 2024

If anyone follows Nvidia's guide to install cuDNN lib from the APT repo automatically, you will only install the latest version (ver 9). cudnn's so and header files are under /usr/lib/x86_64-linux-gnu/ and /usr/include/ (check from whereis libcudnn.so and whereis cudnn.h) instead of /etc/local/cuda/.../. So you should set LD_LIBRARY_PATH to those paths.

Furthermore, if you need to use TensorFlow, it only supports cuDNN under version 8. You can install ver 8's deb package locally and use update-alternatives to switch cudnn's version. Another option is to install specific cuda and cudnn from conda, and cudnn's files will under /home/<USER>/miniconda3/envs/<ENVNAME>/lib/ and /home/<USER>/miniconda3/envs/<ENVNAME>/inlcude/. You can set LD_LIBRARY_PATH in conda env's scripts (following this answer).

@MOGRAINEREPORTS
Copy link

MOGRAINEREPORTS commented Aug 24, 2024

 import tensorflow
2024-08-24 16:40:41.877400: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-24 16:40:41.888614: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-24 16:40:41.891914: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-24 16:40:41.900270: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-24 16:40:42.597083: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> tensorflow.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1724532048.551074    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1724532049.008893    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1724532049.008993    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1724532049.204154    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1724532049.204265    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-08-24 16:40:49.204296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2112] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
I0000 00:00:1724532049.204372    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-08-24 16:40:49.204413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /device:GPU:0 with 8802 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4070 Ti, pci bus id: 0000:05:00.0, compute capability: 8.9
True

Seem to work fine, and it's apparentlly a WSL2 thing, but i can't find a way to suppress those...
Even with

import tensorflow as tf

# Set environment variables to suppress specific TensorFlow logs
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # Suppress INFO, WARNING, and ERROR messages

Or a bash file with these

export TF_CPP_MIN_LOG_LEVEL=3
python my_script.py

People seemed to suggest it was possible to fix it with the right cuda/cudnn combo version but after 3 days of installing and uninstalling environements and WSL and tried a ungodly amount of different version together, im starting to think it's BS

Also, minor point which i can do without - When I install tensorrt

RuntimeError: Tensorflow has not been built with TensorRT support.
but I have tensorflow running almost perfectly (besides thoses numa errors)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment