When using LocalCUDACluster on a single node it is possible to scale your work out on a SLURM based HPC with a few small tweaks.
First install the Dask Runners package. (Note: this is a prototype and will be merged into dask-jobqueue in the future)
pip install git+https://github.com/jacobtomlinson/dask-hpc-runner.gitThen replace LocalCUDACluster with the SLURMRunner class.
from dask_hpc_runner import SlurmRunner
# Tell the SLURM Runner to use the Dask CUDA worker class
cluster = SlurmRunner(worker_class="dask_cuda.CUDAWorker")Run your code on a SLURM system.
srun -n4 python code.pyThat's it!
If you run this script outside of SLURM it will raise a RuntimeError, so if you want to make your code more flexible you could catch this and fall back to creating a LocalCUDACluster.
See example.py for a complete example.