Allocate 3 nodes with 1 task of 24 cpus each:
salloc -N 3 -n 3 -c 24
Within the allocation, make sure you have dask etc (e.g. by activating a virtual or conda env).
Start a scheduler (on the first node) and three workers (one per node):
$ srun -n1 -N1 -r0 dask-scheduler --scheduler-file scheduler.json &>> scheduler.log &
$ srun -n3 -N3 --cpus-per-task=24 dask-worker --scheduler-file scheduler.json --nthreads=24 &>> worker.log &
Start a Python prompt and
In [1]: from dask.distributed import Client, progress
In [2]: client = Client(scheduler_file='scheduler.json')
In [3]: from dask import array as darr
In [4]: xy = darr.random.uniform(0, 1, size=(1000e9 / 2 / 8, 2), chunks=(1e9 / 8 / 2, 2))
In [5]: pi = 4 * ((xy ** 2).sum(axis=-1) < 1.0).mean()
In [6]: pi = pi.persist()
In [7]: progress(pi)
[########################################] | 100% Completed | 1min 2.6s
In [8]: print(pi.compute())
3.141586099776
CC @kathoef