A "Hello, world!" Prefect flow running on an ephemeral Dask Cluster on Kubernetes.
The Prefect Core docs suggest running flows on a Dask cluster via Dask Distributed and an article from the MET Office Informatics Lab demonstrates running an adaptive Dask cluster on k8s. This example is inspired by those sources, as well as the respective docs for the technologies used.
Start a pod on your k8s cluster on which to run your flow. (I use microk8s)
kubectl run --generator run-pod/v1 my-prefect --env "EXTRA_PIP_PACKAGES=prefect dask-kubernetes" --rm -it --image daskdev/dask -- bash
Copy the two files in this gist (flow.py
& worker-spec.yml
) into the pod, and run the flow on the pod:
python -m flow
This will start up an adaptive Dask Cluster on your k8s cluster, and run the Prefect flow on it forever.
You can expose port 8787
on your Dask pod in order to get the Dask scheduler dashboard. Here I use a NodePort
, but there are other options:
kubectl expose pods/my-prefect --port 8787 --type NodePort
Then you can view the services on your cluster, and pick out the one you just created:
kubectl get svc
will give you something like this:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-prefect NodePort 10.152.183.79 <none> 8787:32588/TCP 11s
In this case, once the flow is running, I can navigate to localhost:32588
to see the Dask dashboard.
Additionally, you can run the following to see the worker pods being created and destroyed by the Dask scheduler:
watch kubectl get pods
This is all very ad hoc and interactive in order to demonstrate the process from start to finish. In the real world, this whole process would be automated with a few steps:
- Creating a our own image bundled with the correct requirements and snippets in this gist, instead of using the presupplied
daskdev/dask
image (which installs packages afresh on each worker creation?) - Creating a deployment with a replicaset in yaml and applying that to our cluster via gitops instead of directly running a pod with a generator
- Defining the
NodePort
/LoadBalancer
in yaml and applying that via gitops instead of usingkubectl
However I think this is a useful minimal working example to demonstrate the concept of using Dask on Kubernetes as a scheduling backend for Prefect.