- Install all required software:
docker
,nvidia-docker
,gitlab-ci-multi-runner
- Execute: curl -s http://localhost:3476/docker/cli
- Use that data to fill devices/volumes/volume_driver fields in /etc/gitlab-runner/config.toml
-
-
Save Hopobcn/e38726fac4da272341b0e36ef464c744 to your computer and use it in GitHub Desktop.
concurrent = 1 | |
check_interval = 0 | |
[[runners]] | |
name = "Docker runner <---complete-me--->" | |
url = "https://<---complete-me---->" | |
token = "28ce17edc8ea7437f3e49969c86341" | |
executor = "docker" | |
[runners.docker] | |
tls_verify = false | |
image = "nvidia/cuda" | |
privileged = false | |
disable_cache = false | |
devices = ["/dev/nvidiactl", "/dev/nvidia-uvm", "/dev/nvidia-uvm-tools", "/dev/nvidia3", "/dev/nvidia2", "/dev/nvidia1", "/dev/nvidia0"] | |
volumes = ["/cache", "nvidia_driver_384.81:/usr/local/nvidia:ro"] | |
volume_driver = "nvidia-docker" | |
shm_size = 0 | |
[runners.cache] |
In the [[runers]]
section there's and environment
keyword to define environment vars. But I guess that it wont work because you have to specify that environment var to docker.
So the only way I see is to specify NVIDIA_VISIBLE_DEVICES
directly in the Dockerfile
https://github.com/NVIDIA/nvidia-docker/wiki/Usage#dockerfiles
It seems that environment
in [[runners]]
section is exactly what we were looking for.
Actually, whatever environment variable setting that happens before running the script
section of the .gitlab-ci.yml
configuration file is ok. See the following two examples: both of them worked for me.
Example 1: using gitlab-runner configuration only
In /etc/gitlab-runner/config.toml
:
[[runners]]
name = "runner-gpu0-test"
url = "<url>"
token = "<token>"
executor = "docker"
environment = ["NVIDIA_VISIBLE_DEVICES=0"] # <== Notice this
[runners.docker]
runtime = "nvidia" # <== Notice this
tls_verify = false
image = "nvidia/cuda:9.0-base"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[[runners]]
name = "runner-gpu1-test"
url = "<url>"
token = "<token>"
executor = "docker"
environment = ["NVIDIA_VISIBLE_DEVICES=1"] # <== Notice this
[runners.docker]
runtime = "nvidia" # <== Notice this
tls_verify = false
image = "nvidia/cuda:9.0-base"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
The .gitlab-ci.yml
file.
image: nvidia/cuda:9.0-base
test:run_on_gpu0:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 10s
tags:
- docker
- gpu0
test:run_on_gpu1:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 7s
tags:
- docker
- gpu1
The two runners have been tagged with docker, gpu0
and docker, gpu1
respectively.
Example2: using Gitlab CI custom environment variables
Gitlab CI custom environment variables
/etc/gitlab-runner/config.toml
same as Example 1.
The .gitlab-ci.yml
file.
image: nvidia/cuda:9.0-base
variables:
NVIDIA_VISIBLE_DEVICES: "3" # This is going to override definition(s) in /etc/gitlab-runner/config.toml
test:run_on_gpu0:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 10s
tags:
- docker
- gpu0
test:run_on_gpu1:
stage: test
script:
- echo NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES}
- nvidia-smi
- sleep 7s
tags:
- docker
- gpu1
Do you guys know how to make it work with docker v19.03.2 which integrates native support for nvidia gpus?
The runtime = "nvidia"
does not work anymore, containers should be executed with --gpus
flag now.
docker run -it --rm --gpus all ubuntu nvidia-smi
it is an open issue and, looking at the comments, it does not seem to be fixed soon.
I am using Docker 19.03 together with nvidia-docker2. This provides the new --gpu
switch, while keeping the compatibility with the old --runtime
switch (refer to https://github.com/NVIDIA/nvidia-docker/tree/master#upgrading-with-nvidia-docker2-deprecated).
The following
config.toml
provides GPU support (notice theruntime
parameter).Yet, is it not clear to me how to restrict the GPUs assigned to the runner, on a multi-GPU server. This functionality is named "GPU isolation".
The docker run command for GPU isolation follows: please notice the
-e NVIDIA_VISIBLE_DEVICES=0
. how can this be set for the runner inconfig.toml
?