- Install all required software:
docker
,nvidia-docker
,gitlab-ci-multi-runner
- Execute: curl -s http://localhost:3476/docker/cli
- Use that data to fill devices/volumes/volume_driver fields in /etc/gitlab-runner/config.toml
-
-
Save Hopobcn/e38726fac4da272341b0e36ef464c744 to your computer and use it in GitHub Desktop.
concurrent = 1 | |
check_interval = 0 | |
[[runners]] | |
name = "Docker runner <---complete-me--->" | |
url = "https://<---complete-me---->" | |
token = "28ce17edc8ea7437f3e49969c86341" | |
executor = "docker" | |
[runners.docker] | |
tls_verify = false | |
image = "nvidia/cuda" | |
privileged = false | |
disable_cache = false | |
devices = ["/dev/nvidiactl", "/dev/nvidia-uvm", "/dev/nvidia-uvm-tools", "/dev/nvidia3", "/dev/nvidia2", "/dev/nvidia1", "/dev/nvidia0"] | |
volumes = ["/cache", "nvidia_driver_384.81:/usr/local/nvidia:ro"] | |
volume_driver = "nvidia-docker" | |
shm_size = 0 | |
[runners.cache] |
Is there a newer method ?
I have tried by installing the nvidia-docker, + docker + the runner itself. Then I set only the runtime parameter of the runner to be "nvidia" and the executor to be "docker" but tensorflow for example doesn't detect the GPUs at all.
The following config.toml
provides GPU support (notice the runtime
parameter).
concurrent = 1
check_interval = 0
[[runners]]
name = "Docker runner <---complete-me--->"
url = "https://<---complete-me---->"
token = "28ce17edc8ea7437f3e49969c86341"
executor = "docker"
[runners.docker]
tls_verify = false
image = "nvidia/cuda"
privileged = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
runtime = "nvidia"
[runners.cache]
Yet, is it not clear to me how to restrict the GPUs assigned to the runner, on a multi-GPU server. This functionality is named "GPU isolation".
The docker run command for GPU isolation follows: please notice the -e NVIDIA_VISIBLE_DEVICES=0
. how can this be set for the runner in config.toml
?
docker run --runtime=nvidia --rm -e NVIDIA_VISIBLE_DEVICES=0 nvidia/cuda:9.0-base nvidia-smi
In the [[runers]]
section there's and environment
keyword to define environment vars. But I guess that it wont work because you have to specify that environment var to docker.
So the only way I see is to specify NVIDIA_VISIBLE_DEVICES
directly in the Dockerfile
https://github.com/NVIDIA/nvidia-docker/wiki/Usage#dockerfiles
It seems that environment
in [[runners]]
section is exactly what we were looking for.
Actually, whatever environment variable setting that happens before running the script
section of the .gitlab-ci.yml
configuration file is ok. See the following two examples: both of them worked for me.
Example 1: using gitlab-runner configuration only
In /etc/gitlab-runner/config.toml
:
[[runners]]
name = "runner-gpu0-test"
url = "<url>"
token = "<token>"
executor = "docker"
environment = ["NVIDIA_VISIBLE_DEVICES=0"] # <== Notice this
[runners.docker]
runtime = "nvidia" # <== Notice this
tls_verify = false
image = "nvidia/cuda:9.0-base"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[[runners]]
name = "runner-gpu1-test"
url = "<url>"
token = "<token>"
executor = "docker"
environment = ["NVIDIA_VISIBLE_DEVICES=1"] # <== Notice this
[runners.docker]
runtime = "nvidia" # <== Notice this
tls_verify = false
image = "nvidia/cuda:9.0-base"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
The .gitlab-ci.yml
file.
image: nvidia/cuda:9.0-base
test:run_on_gpu0:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 10s
tags:
- docker
- gpu0
test:run_on_gpu1:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 7s
tags:
- docker
- gpu1
The two runners have been tagged with docker, gpu0
and docker, gpu1
respectively.
Example2: using Gitlab CI custom environment variables
Gitlab CI custom environment variables
/etc/gitlab-runner/config.toml
same as Example 1.
The .gitlab-ci.yml
file.
image: nvidia/cuda:9.0-base
variables:
NVIDIA_VISIBLE_DEVICES: "3" # This is going to override definition(s) in /etc/gitlab-runner/config.toml
test:run_on_gpu0:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 10s
tags:
- docker
- gpu0
test:run_on_gpu1:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 7s
tags:
- docker
- gpu1
Do you guys know how to make it work with docker v19.03.2 which integrates native support for nvidia gpus?
The runtime = "nvidia"
does not work anymore, containers should be executed with --gpus
flag now.
docker run -it --rm --gpus all ubuntu nvidia-smi
it is an open issue and, looking at the comments, it does not seem to be fixed soon.
I am using Docker 19.03 together with nvidia-docker2. This provides the new --gpu
switch, while keeping the compatibility with the old --runtime
switch (refer to https://github.com/NVIDIA/nvidia-docker/tree/master#upgrading-with-nvidia-docker2-deprecated).
This method is outdated.