Rancher should be super easy to use to setup a k8s cluster. It gives you one line commands to run on your worker nodes to connect them to your k8s cluster.
Their official docs are a little overwhelming, so here's a compressed guide:
- On the master node (lets call it master), run
sudo docker run --privileged -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher
. This will launch the rancher web UI. - To access this from your laptop, you may need to port forward port 443 to your laptop. Do so with
ssh -L 8888:localhost:443 test
- Open localhost:8888 in your browser. Complete the initial setup and set any password for the webui. It will also ask for an address accessible from all nodes. This field will be prepopulated with https://localhost:8888. Change this to the ip address of the node (get with ip a, should be accessible from other nodes as well) and don't specify port. E.g., https://10.35.0.21/.
- You'll see the rancher UI. Click on Create cluster -> Custom.
- Use default settings and give any name. Click on create.
- Next screen shows the commands to enter in your nodes. Make sure insecure checkbox in step 2 is checked.
- First, we will generate the command for your master node. Check all three boxes (etcd, Control Plane, Worker) then copy the command from Step 2. Open a new terminal, ssh into your master node and run the command. It should successfully run.
- Next, we will generate the command for your worker nodes. Check only the (Worker) checkbox and copy the command from Step 2. Open a new terminal, ssh into each of your worker nodes and run the copied command.
- Rancher will show workers connecting, and after a few min your k8s cluster should be up and running.
- On the cluster management page, click on your cluster name and click Download kubeconfig to get the access credentials for your cluster. Name this file config and move it to ~/.kube. Run kubectl get nodes to verify all is good.
- After your Kubernetes cluster is up and running, run sky check to make sure it prints out Kubernetes: enabled .
- Setup GPU support by installing nvidia GPU operator with:
# Install helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \
&& chmod 700 get_helm.sh \
&& ./get_helm.sh
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
&& helm repo update
# Setup namespace
kubectl create ns gpu-operator
kubectl label --overwrite ns gpu-operator pod-security.kubernetes.io/enforce=privileged
# Install GPU operator
helm install gpu-operator -n gpu-operator --create-namespace \
nvidia/gpu-operator $HELM_OPTIONS \
--set toolkit.env[0].name=CONTAINERD_CONFIG \
--set toolkit.env[0].value=/var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl \
--set toolkit.env[1].name=CONTAINERD_SOCKET \
--set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \
--set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
--set toolkit.env[2].value=nvidia \
--set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
--set-string toolkit.env[3].value=true