Skip to content

Instantly share code, notes, and snippets.

@kuang-da
Last active August 16, 2024 15:24
Show Gist options
  • Save kuang-da/2796a792ced96deaf466fdfb7651aa2e to your computer and use it in GitHub Desktop.
Save kuang-da/2796a792ced96deaf466fdfb7651aa2e to your computer and use it in GitHub Desktop.
[Install nvidia-docker2 In Pop!_OS]#popos

Introduction

This gist is a note about install nvidia-docker in Pop!_OS 20.10. nvidia-docker is used to help docker containers compute on GPU.

The basic installcation is in Nvidia's offical documentation. But there are a few tweaks to make it work on Pop!_OS 20.10.

Setting up Docker

No surprise. Follow the offical documentaion should work.

Setting up NVIDIA Container Toolkit

Adding NVIDIA Source

Pop!_OS is an "Unsupported distribution" in Nvidia source. Also, Ubuntu 20.10 are not supported by Nvidia source yet. So we need to change the distribution into ubuntu20.04 when adding sources. For instacne,

distribution="ubuntu20.04" \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Reference:

Install nvidia-docker2

While installing nvidia-docker2, I got the following error

(base) ➜  ~ sudo apt-get install -y nvidia-docker2              
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.5.0) but 3.4.0-1pop1~1601325114~20.10~2880fc6 is to be installed
E: Unable to correct problems, you have held broken packages.

It is because Pop!_OS's own source for Nvidia driver has high priority than Nvidia's offical source. But the dependencies for nvidia-docker2 falls behind to Nvidia's offical source. To fix that, we could give nvdia docker source a higher priority as folllows.

vi /etc/apt/preferences.d/nvidia-docker-pin-1002
with content;
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002

Then follow the offical documentation by running the following command. We will launch a container with GPU.

sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

Reference:

@kuang-da
Copy link
Author

kuang-da commented Apr 19, 2022

@VitalyyBezuglyj Thanks for the feedback. I have updated my gist accordingly. Happy coding!

@OxygenLiu
Copy link

Thank you! It is much easier for Pop OS 22.04 as the following:

  • sudo apt install nvidia-docker2
  • set no-cgroups = true in /etc/nvidia-container-runtime/control.toml
  • run test docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

@ecatanzani
Copy link

Thank you! It is much easier for Pop OS 22.04 as the following:

* `sudo apt install nvidia-docker2`

* set `no-cgroups = true` in `/etc/nvidia-container-runtime/control.toml`

* run test `docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi`

This is indeed perfectly working but before running the container you need to restart docker: sudo systemctl restart docker

The full chain should be the following:

  • udo apt install nvidia-docker2
  • set no-cgroups = true in /etc/nvidia-container-runtime/control.toml
  • sudo systemctl restart docker
  • run test docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

@warrenmwang
Copy link

Hi, the file /etc/nvidia-container-runtime/control.toml doesn't exist for me after running sudo apt install nvidia-docker2 which installed ok, and I am on Pop OS 22.04. So, I can't run my container with GPU. Can someone help?

@jbartolozzi
Copy link

having the same issue as @Warren-Wang-OG

@hctsantos
Copy link

Hi, the file /etc/nvidia-container-runtime/control.toml doesn't exist for me after running sudo apt install nvidia-docker2 which installed ok, and I am on Pop OS 22.04. So, I can't run my container with GPU. Can someone help?

I am using Pop OS 22.04 and made the suggested change in the file is /etc/nvidia-container-runtime/config.toml. It worked correctly for me.

@Linux-cpp-lisp
Copy link

Linux-cpp-lisp commented Sep 19, 2022

Worth noting that this also worked for me with a much more restrictive glob:

Package: *nvidia*
Pin: origin nvidia.github.io
Pin-Priority: 1002

(Pop!_OS 22.04)

@afiaka87
Copy link

afiaka87 commented Dec 6, 2022

Thanks, this is the correct filename now.

@fxtentacle
Copy link

Thanks @kuang-da and @Linux-cpp-lisp for so clearly pointing out that the issue is the modified Pop!OS packages.

Searching the internet by my error "failed to set inheritable capabilities", I found various suggestions from mounting /dev into the container with -v or using --privileged or setting no-cgroups = true and none of it could make docker work the same as on my Ubuntu machine ...

But also for Pop!OS 22.04, the cleanest solution is to just install the official NVIDIA packages 😅

@jacobalarcon
Copy link

Worth noting that this also worked for me with a much more restrictive glob:

Package: *nvidia*
Pin: origin nvidia.github.io
Pin-Priority: 1002

(Pop!_OS 22.04)

This worked. Thank you so much.

@danpaldev
Copy link

Same here! It worked using config instead of control.

@austiecodes
Copy link

Tried and worked, thx a lot!

@illtellyoulater
Copy link

Hey everyone, I have noticed that there are two main methods for installing nvidia-docker2 on Pop!_OS 22.04. One is described in the System76 support article (updated in March 2023), and the other is outlined in this gist. As a non-expert, I was curious about the differences between these two methods, and what the advantages and disadvantages of each might be. So I asked ChatGPT-4.0 to explain the differences and here's the comprehensive response it provided:

The first method, as outlined in the System76 support article, involves using the nvidia-container-toolkit package and executes the following instructions:

sudo apt update
sudo apt full-upgrade
sudo apt install nvidia-container-toolkit docker.io
sudo usermod -aG docker $USER
sudo kernelstub --add-options "systemd.unified_cgroup_hierarchy=0"
[reboot...]
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
[done]

This approach appears straightforward and may be easier for novice users to follow. Each command updates the system, installs the necessary packages, adds the current user to the Docker group (allowing Docker commands to be run without sudo), modifies a kernel parameter to disable the unified cgroup hierarchy (a feature of systemd), reboots the system, configures Docker to use the NVIDIA libraries when running containers, and restarts Docker. However, it might not support the most recent versions of CUDA if the nvidia-container-toolkit package hasn't been updated recently. Notably, this method involves disabling the unified cgroup hierarchy feature of systemd, a significant system component, which could potentially lead to compatibility issues with future software that expects this feature to be enabled.

The second method, as described in this gist, involves using the nvidia-docker2 package and executes the following instructions:

sudo apt install nvidia-docker2
set no-cgroups = true in /etc/nvidia-container-runtime/control.toml
sudo systemctl restart docker
vi /etc/apt/preferences.d/nvidia-docker-pin-1002
  Package: *nvidia*
  Pin: origin nvidia.github.io
  Pin-Priority: 1002

This method seems more complex and may require a deeper understanding of Docker and Linux systems. However, it can be considered less invasive as it avoids modifying system components, and more flexible as it might support more recent versions of CUDA if the nvidia-docker2 package has been recently updated.

When considering the benefits and drawbacks of the two methods, the first option may be simpler to execute and more reliable due to its direct support from System76, the developers of Pop!_OS. However, it may not be compatible with the most current versions of CUDA. In contrast, the second option may support newer versions of CUDA, but it may be more challenging to implement and less stable since it lacks direct support from System76.

In order to determine the most suitable method for your needs, it is necessary to consider various factors such as your technical proficiency, the version of CUDA you plan to use, and your specific requirements. Which of the two methods you choose will ultimately depend on these considerations.

@kuang-da
Copy link
Author

kuang-da commented Jun 7, 2023

Hi @illtellyoulater,

The primary distinction between my gist and system76's tutorial is the inclusion of source from NVIDIA in my gist. This ensures that the nvidia-container-toolkit is the most up-to-date version. On the other hand, system76's tutorial installs nvidia-container-toolkit from their own channel. Both approaches are likely to be effective.

PS: Prior to creating this gist, I encountered issues with system76's tutorial, although I can't recall the exact reasons now. I'm pleasantly surprised that this gist continues to be helpful to others even after two years. 😆

@MRo47
Copy link

MRo47 commented Aug 16, 2024

Thank you!

@MRo47
Copy link

MRo47 commented Aug 16, 2024

@illtellyoulater @kuang-da
I tried both, @kuang-da ' s version works for me. For the other my only issue is I want to install the docker that's new from their offical page as compared to docker.io which is old.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment