mantasu/install-cuda-tf-pytorch.md

Last active March 17, 2025 21:35

Star (24) You must be signed in to star a gist
Fork (2) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/mantasu/d79d23b58d822d675274f87c46eb7aca.js"></script>
Save mantasu/d79d23b58d822d675274f87c46eb7aca to your computer and use it in GitHub Desktop.

Download ZIP

Tensorflow & Pytorch installation with CUDA (Linux and WSL2 for Windows 11)

Raw

install-cuda-tf-pytorch.md

Install Tensorflow & Pytorch with CUDA [Linux | WSL2]

Overview

This guide provides steps on how to install Tensorflow and Pytorch on Linux environment (including WSL2 - Windows Subsystem for Linux) with NVIDIA GPU support. Here I focus on Ubuntu 22.04 and WSL2 (Windows 11) but things should work on more/less recent/relevant versions. From what I've checked there are no full consistent guidelines, hopefully this one should clear things up (also serve as a reminder for me).

To install purely on Windows 10/11 (no WSL), I suggest to follow this tutorial.

GPU Setup

NVIDIA Driver

Please install the newest NVIDIA drivers. There are plenty of tutorials on how to do this, here are some examples:

Linux - guide by Shahriar Shovon on how to install & uninstall NVIDIA drivers on Ubuntu 22.04 LTS
Windows 11 (WSL2 users) - use GeForce Experience for automatic updates or update manually by following the official guide

Cuda

CUDA Toolkit can be installed by following the official guides. CUDA is backwards compatible with previous versions so please install the newest version.

Linux - install deb/rpm (preferably network version) from NVIDIA page. If needed, check the guide
WSL2 - install WSL2 on Windows 11 by following this guide, then CUDA Toolkit by following the subsequent guide

Notice for WSL2 users: as mentioned in the official Ubuntu guide, "the CUDA driver used is part of the Windows driver installed on the system" so make sure to follow those steps since installation is not the same as on a separate Linux system.

CuDNN

From here the installation is the same for both Linux and WSL2 users. All there is to do is, again, to follow the official guide. To keep things simple:

Ensure you're registered for the NVIDIA Developer Program
Install Zlib as specified here
Download deb*/rpm for newest CUDA here (complete survey and accept terms)
Install the downloaded file as specified here (don't install libcudnn8-samples)

*For Debian releases, check the architecture type (to download the correct deb file):

$ dpkg --print-architecture

Path Setup & Version Management

Add the following lines to your ~/.profile or ~/.bash_profile file (alternatively, ~/.bashrc also works):

# Add header locations to path variables
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/include

# UNCOMMENT if you use WSL2
# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/wsl/lib

Note that /usr/local/ should contain the newest CUDA directory, e.g., currently, cuda-11.7. There may also be /usr/local/cuda-11 and /usr/local/cuda which are simply shortcuts to the newest cuda-X.Y directory. cuda is chosen to be used as export path because, if there are any version changes, /usr/local/cuda should point to the selected one.

You can switch between CUDA and CuDNN versions (if they were installed from deb/rpm) with the following commands:

$ sudo update-alternatives --config cuda # switch CUDA version

$ sudo update-alternatives --config libcudnn # switch cuDNN version

Reboot as a final step for GPU Setup

Package & Library Setup

Conda

The installation for Anaconda is as follows (official guide):

Download anaconda installer for your Linux distribution

Run the installer (replace 2022.05 and x86_64 with the downloaded version)

$ bash Anaconda-2022.05-Linux-x86_64.sh # type `yes` at the end to init conda

You can disable automatic activation of the base environment:
```
$ conda config --set auto_activate_base false
```

In case you want to remove it later, just remove the entire directory:

$ rm -rf $CONDA_PREFIX # ensure conda environment is deactivated when running

You don't have to install Anaconda as you can simply create environments for every project with virtual environment, however Anaconda or Miniconda makes environments easier to manage and alleviates some issues with GPU setup if there are any.

Tensorflow

Create new CONDA environment and install Tensorflow (CUDA 11.7 or newer should be backwards compatible):

$ conda create -n tf-ws python=3.10 # currently Python 3.10.4 is the newest
$ conda activate tf-ws
$ pip install tensorflow

To test if Tensorflow supports GPU:

$ python
>>> import tensorflow
>>> tensorflow.test.is_gpu_available()
...
True
>>> exit()
$ conda deactivate

For WSL2 users: there should be warnings about NUMA support but they can be ignored as it is a small side effect of WSL2.

Pytorch

Create new CONDA environment and install Pytorch (CUDA 11.7 or newer should be backwards compatible):

$ conda create -n torch-ws python=3.10 # currently Python 3.10.4 is the newest
$ conda activate torch-ws
$ pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
$ pip install [pytorch-lightning | pytorch-ignite] # choose either (optional)

To test if Pytorch supports GPU:

$ python
>>> import torch
>>> torch.cuda.is_available()
True
>>> exit()
$ conda deactivate

Author

mantasu commented Apr 12, 2024

There could be issues with package paths not being detected. Make sure you've made the packages visible in ~/.bashrc:

# Add header locations to path variables
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/include

# UNCOMMENT if you use WSL2
# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/wsl/lib

Further, you could check for a solution for the same error in 34287. Or tensorflow/issues should also contain similar topics since it is a very common problem people face.

dilberiqbal commented Apr 17, 2024

i have run this code "import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
" but following message is displayed " []
2024-04-17 11:01:21.100389: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-17 11:01:21.354640: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices..." how to solve this issue?

Author

mantasu commented Apr 18, 2024

@dilberiqbal If you're using WSL2, you have to uncomment this:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/wsl/lib

Alternatively, maybe a thread on NVIDIA could help you solve this.

oxygenkun commented Jun 4, 2024 •

edited

Loading

If anyone follows Nvidia's guide to install cuDNN lib from the APT repo automatically, you will only install the latest version (ver 9). cudnn's so and header files are under /usr/lib/x86_64-linux-gnu/ and /usr/include/ (check from whereis libcudnn.so and whereis cudnn.h) instead of /etc/local/cuda/.../. So you should set LD_LIBRARY_PATH to those paths.

Furthermore, if you need to use TensorFlow, it only supports cuDNN under version 8. You can install ver 8's deb package locally and use update-alternatives to switch cudnn's version. Another option is to install specific cuda and cudnn from conda, and cudnn's files will under /home/<USER>/miniconda3/envs/<ENVNAME>/lib/ and /home/<USER>/miniconda3/envs/<ENVNAME>/inlcude/. You can set LD_LIBRARY_PATH in conda env's scripts (following this answer).

MOGRAINEREPORTS commented Aug 24, 2024 •

edited

Loading

 import tensorflow
2024-08-24 16:40:41.877400: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-24 16:40:41.888614: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-24 16:40:41.891914: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-24 16:40:41.900270: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-24 16:40:42.597083: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> tensorflow.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1724532048.551074    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1724532049.008893    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1724532049.008993    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1724532049.204154    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1724532049.204265    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-08-24 16:40:49.204296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2112] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
I0000 00:00:1724532049.204372    6068 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-08-24 16:40:49.204413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /device:GPU:0 with 8802 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4070 Ti, pci bus id: 0000:05:00.0, compute capability: 8.9
True

Seem to work fine, and it's apparentlly a WSL2 thing, but i can't find a way to suppress those...
Even with

import tensorflow as tf

# Set environment variables to suppress specific TensorFlow logs
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # Suppress INFO, WARNING, and ERROR messages

Or a bash file with these

export TF_CPP_MIN_LOG_LEVEL=3
python my_script.py

People seemed to suggest it was possible to fix it with the right cuda/cudnn combo version but after 3 days of installing and uninstalling environements and WSL and tried a ungodly amount of different version together, im starting to think it's BS

Also, minor point which i can do without - When I install tensorrt

RuntimeError: Tensorflow has not been built with TensorRT support.
but I have tensorflow running almost perfectly (besides thoses numa errors)

cyberluke commented Jan 25, 2025

What if I would like to create some environment or orchestration where you split: CUDA on WSL2 but PyTorch on Windows, so you would write code in VS Code in local Pytorch, but during the run, all calls to external libraries or .dll would be routed or proxied to WSL2, where would be mirror of requirements.txt but compiled for Linux where there are more features and active maintenance. It should be seamless for developer, but under the hood it would need some more complex real-time automation.

mantasu/install-cuda-tf-pytorch.md

Install Tensorflow & Pytorch with CUDA [Linux | WSL2]

Overview

GPU Setup

NVIDIA Driver

Cuda

CuDNN

Path Setup & Version Management

Package & Library Setup

Conda

Tensorflow

Pytorch

mantasu commented Apr 12, 2024

Uh oh!

dilberiqbal commented Apr 17, 2024

Uh oh!

mantasu commented Apr 18, 2024

Uh oh!

oxygenkun commented Jun 4, 2024 •

edited

Loading

Uh oh!

MOGRAINEREPORTS commented Aug 24, 2024 •

edited

Loading

Uh oh!

cyberluke commented Jan 25, 2025

Uh oh!

mantasu/install-cuda-tf-pytorch.md

Install Tensorflow & Pytorch with CUDA [Linux | WSL2]

Overview

GPU Setup

NVIDIA Driver

Cuda

CuDNN

Path Setup & Version Management

Package & Library Setup

Conda

Tensorflow

Pytorch

mantasu commented Apr 12, 2024

Uh oh!

dilberiqbal commented Apr 17, 2024

Uh oh!

mantasu commented Apr 18, 2024

Uh oh!

oxygenkun commented Jun 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MOGRAINEREPORTS commented Aug 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyberluke commented Jan 25, 2025

Uh oh!

oxygenkun commented Jun 4, 2024 •

edited

Loading

MOGRAINEREPORTS commented Aug 24, 2024 •

edited

Loading