How to setup kubeflow locally

This process worked for me after 5 days of hell trying to get everything running. Most of this was pulled from this excellent blog post. https://jacobtomlinson.dev/posts/2022/running-kubeflow-inside-kind-with-gpu-support/

Setup the kind cluster

ToDo: Find the patch to kind to allow GPU attachment... (kind of important, I know...)

Create a kind-gpu.yaml file

# kind-gpu.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: kubeflow-gpu
nodes:
  - role: control-plane
    image: kindest/node:v1.21.2
    gpus: True
    extraPortMappings:
    - containerPort: 31080
      listenAddress: 127.0.0.1
      hostPort: 80
kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
      extraArgs:
        "service-account-issuer": "kubernetes.default.svc"
        "service-account-signing-key-file": "/etc/kubernetes/pki/sa.key"

Fire up the cluster

$kind create cluster --config kind-gpu.yaml
Creating cluster "kubeflow-gpu" ...
 ✓ Ensuring node image (kindest/node:v1.21.2) 🖼
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-kubeflow-gpu"
You can now use your cluster with:

kubectl cluster-info --context kind-kubeflow-gpu

Have a nice day! 👋

Switch to the proper context

kubectx kind-kubeflow-gpu

Make cluster GPU aware

Next we need to install the NVIDIA operator via helm. This will add the device plugins to the Kuberenetes API so it can detect GPUs and schedule them.

We want to avoid the operator trying to install drivers though as we already did that so we need to disable driver installs.

helm repo add nvidia https://nvidia.github.io/gpu-operator \
  && helm repo update

helm install --wait --generate-name \
  -n gpu-operator --create-namespace \
  nvidia/gpu-operator \
  --set driver.enabled=false

Install kubeflow

Firse we need to clone the manifests provided by the working group

git clone https://github.com/kubeflow/manifests.git
cd manifests
git checkout v1.5-branch

Now it is time to run the install!

$ ./hack/setup-kubeflow.sh

Go grab some coffee and watch the pods spin up (roughly 74). I had to run the setup-kubeflow.sh script three times and it took roughly 15 minutes on a 14 core machine with 128 GB ram.

sberryman/00-pita.md

How to setup kubeflow locally

Setup the kind cluster

Make cluster GPU aware

Install kubeflow