K8sGPT is a tool for scanning your kubernetes clusters, diagnosing and triaging issues in simple english. It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.
K8sGPT Is For…
- Workload health analysis - Find critical issues with your workloads.
- Fast triage, AI analysis - Look at your cluster a glance or use AI to analyze your cluster in depth
- Humans - Complex signals into easy to understand suggestions
- Security CVE review - Connect to scanners like Trivy and triage issues
Orchestration is a key component of CN systems.
What is Kubernetes (k8s)? Per https://kubernetes.io
Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications.
K8s is a complex system, the next sections are tools (e.g. Docker, K3d) we leverage to install a complete K8s system.
Another critical component of CN systems is the use of containers. Docker remains a viable container runtime.
What is Docker? Per https://docs.docker.com/get-started/overview/
Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker's methodologies for shipping, testing, and deploying code, you can significantly reduce the delay between writing code and running it in production.
~$ curl -sL https://get.docker.com | sh -
...
~$ sudo usermod -aG docker $(whoami)
~$ exit # reload user permissions in terminal
$ ssh -i key.pem ubuntu@<VM_IP> # ssh back to VM
Prove Docker is up and running (this connects to the daemon, so we know its listening).
~$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 0 0 0B 0B
Containers 0 0 0B 0B
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B
With our container runtime installed, we move to install K8s via k3d.
What is k3d? Per https://k3d.io
k3d is a lightweight wrapper to run k3s (Rancher Lab’s minimal Kubernetes distribution) in docker.
k3d makes it very easy to create single- and multi-node k3s clusters in docker, e.g. for local development on Kubernetes.
Yes, k3d is wrapper around k3s. So what is k3s?
Per https://github.com/k3s-io/k3s?tab=readme-ov-file#what-is-this
K3s is a fully conformant production-ready Kubernetes distribution with the following changes:
- It is packaged as a single binary.
- It adds support for sqlite3 as the default storage backend. Etcd3, MySQL, and Postgres are also supported.
- It wraps Kubernetes and other components in a single, simple launcher.
- It is secure by default with reasonable defaults for lightweight environments.
- It has minimal to no OS dependencies (just a sane kernel and cgroup mounts needed).
- It eliminates the need to expose a port on Kubernetes worker nodes for the kubelet API by exposing this API to the Kubernetes control plane nodes over a websocket tunnel.
Not to belabor the point, k3s provides defaults for many useful if not required k8s components.
K3s bundles the following technologies together into a single cohesive distribution:
- Containerd & runc
- Flannel for CNI
- CoreDNS
- Metrics Server
- Traefik for ingress
- Klipper-lb as an embedded service load balancer provider
- Kube-router netpol controller for network policy
- Helm-controller to allow for CRD-driven deployment of helm manifests
- Kine as a datastore shim that allows etcd to be replaced with other databases
- Local-path-provisioner for provisioning volumes using local storage
- Host utilities such as iptables/nftables, ebtables, ethtool, & socat
We begin by installing k3d (and k3s).
~$ curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
...
Finally we can install K8s vis k3d.
~$ k3d cluster create "k8sgpt-cluster" --image "rancher/k3s:v1.26.9-k3s1"
INFO[0000] Prep: Network
INFO[0000] Created network 'k3d-k8sgpt-cluster'
INFO[0000] Created image volume k3d-k8sgpt-cluster-images
INFO[0000] Starting new tools node...
INFO[0000] Pulling image 'ghcr.io/k3d-io/k3d-tools:5.6.0'
INFO[0001] Creating node 'k3d-k8sgpt-cluster-server-0'
INFO[0001] Starting Node 'k3d-k8sgpt-cluster-tools'
INFO[0001] Pulling image 'rancher/k3s:v1.26.9-k3s1'
INFO[0004] Creating LoadBalancer 'k3d-k8sgpt-cluster-serverlb'
INFO[0005] Pulling image 'ghcr.io/k3d-io/k3d-proxy:5.6.0'
INFO[0007] Using the k3d-tools node to gather environment information
INFO[0007] HostIP: using network gateway 172.20.0.1 address
INFO[0007] Starting cluster 'k8sgpt-cluster'
INFO[0007] Starting servers...
INFO[0007] Starting Node 'k3d-k8sgpt-cluster-server-0'
INFO[0012] All agents already running.
INFO[0012] Starting helpers...
INFO[0012] Starting Node 'k3d-k8sgpt-cluster-serverlb'
INFO[0018] Injecting records for hostAliases (incl. host.k3d.internal) and for 2 network members into CoreDNS configmap...
INFO[0020] Cluster 'k8sgpt-cluster' created successfully!
INFO[0020] You can now use it like this:
kubectl cluster-info
K8s nodes are normally VMs, however K3d allows us to run a node in a container.
~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0444a0d5c238 ghcr.io/k3d-io/k3d-proxy:5.6.0 "/bin/sh -c nginx-pr…" About a minute ago Up About a minute 80/tcp, 0.0.0.0:45105->6443/tcp k3d-deploykf-serverlb
b7781ddae510 rancher/k3s:v1.26.9-k3s1 "/bin/k3d-entrypoint…" About a minute ago Up About a minute k3d-deploykf-server-0
# ctr is a client for interacting with containerd daemon directly
# docker inturn talks to docker daemon which then talks to containerd
~$ sudo ctr -n moby c ls
CONTAINER IMAGE RUNTIME
ab7a6c33402acef3d9a7c5254d2d3e3d057f8b38dd0135eb2ae64bc8ac9a7e25 - io.containerd.runc.v2
ca790d4975bfc82d49545fa4229e9033b1783d64e125e1a8d1df4dd4d244adc1 - io.containerd.runc.v2
Using the k3d command line, we see the similar results.
~$ k3d node list
NAME ROLE CLUSTER STATUS
k3d-k8sgpt-cluster-server-0 server k8sgpt-cluster running
k3d-k8sgpt-cluster-serverlb loadbalancer k8sgpt-cluster running
You can view what is running in the server container (aka the k8s control plane) as follows.
~$ docker top k3d-k8sgpt-cluster-server-0 co pid,command
PID COMMAND
7453 docker-init
7488 k3d-entrypoint.
7609 containerd
8348 containerd-shim
8361 containerd-shim
8375 containerd-shim
9680 containerd-shim
9785 containerd-shim
7504 k3s
9704 pause
9979 entry
9918 entry
8420 pause
8761 local-path-prov
10101 traefik
9805 pause
8934 metrics-server
8416 pause
8419 pause
8722 coredns
Notice the PIDs, that is because "top" is using the ps
on your system. Compare that to ps in the container itself (which is busybox, so a less functional version of ps).
~$ docker exec k3d-k8sgpt-cluster-server-0 ps | awk '{print $1 " " $3}'
PID COMMAND
1 /sbin/docker-init
7 {k3d-entrypoint.}
23 /bin/k3s
128 containerd
681 /bin/containerd-shim-runc-v2
694 /bin/containerd-shim-runc-v2
708 /bin/containerd-shim-runc-v2
749 /pause
752 /pause
753 /pause
1055 /coredns
1094 local-path-provisioner
1267 /metrics-server
2010 /bin/containerd-shim-runc-v2
2034 /pause
2115 /bin/containerd-shim-runc-v2
2135 /pause
2247 {entry}
2308 {entry}
2429 traefik
3121 ps
On the host, k3s is pid 7504, but in the container its 23 (custom view of /proc filesystem is a key containerzation feature).
Note the availability of the metrics-server. This means commands such as kubectl top pods
and kubectl top nodes
will work. However, we need to install kubectl first!
kubectl is the official CLI tool to operate K8s. As mentioned before, Kubeflow requires k8s 1.26, so we will use the same version of kubectl.
~$ curl -sLO "https://dl.k8s.io/release/v1.26.9/bin/linux/amd64/kubectl" # same version as k3d install of k8s control plane
~$ sudo mv kubectl /usr/bin/
~$ sudo chown $(whoami) /usr/bin/kubectl
~$ chmod u+x /usr/bin/kubectl
Confirm it works and can reach the K8s API server.
~$ kubectl version
...
~$ source <(kubectl completion bash)
When we created the cluster, k3d also provided the configuration (user, certs, etc.).
# cat ~/.kube/config
# to see raw file
~$ kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://0.0.0.0:40755
name: k3d-k8sgpt-cluster
contexts:
- context:
cluster: k3d-k8sgpt-cluster
user: admin@k3d-k8sgpt-cluster
name: k3d-k8sgpt-cluster
current-context: k3d-k8sgpt-cluster
kind: Config
preferences: {}
users:
- name: admin@k3d-k8sgpt-cluster
user:
client-certificate-data: DATA+OMITTED
client-key-data: DATA+OMITTED
In this section, we will run a simple test application and confirm all is well with our K8s installation.
We are going to launch a script that runs two commands "echo" and "tail"; confirming along the way K8s API operations.
~$ ps wuxa | grep tail # make sure tail is not running
# nothing
~$ kubectl run testing --image ubuntu:22.04 --command sh -- -c 'echo "hello world"; tail -f /dev/null' # launch script
~$ ps wuxa | grep tail # make sure tail is running
root 12118 0.2 0.0 2892 1664 ? Ss 23:30 0:00 sh -c echo "hello world"; tail -f /dev/null
root 12130 0.0 0.0 2824 1536 ? S 23:30 0:00 tail -f /dev/null
ubuntu 12154 0.0 0.0 7008 2304 pts/0 S+ 23:30 0:00 grep --color=auto tail
# another way to confirm tail is running
~$ docker top k3d-k8sgpt-cluster-server-0 | grep tail
root 12118 12064 0 23:30 ? 00:00:00 sh -c echo "hello world"; tail -f /dev/null
root 12130 12118 0 23:30 ? 00:00:00 tail -f /dev/null
~$ kubectl logs testing # confirm echo ran
hello world
# show debugging information while deleting the pod, this may take a minute (finalizers, close connection, etc.)
~$ kubectl delete pod testing -v=9
...
I0315 23:32:23.571322 12791 round_trippers.go:466] curl -v -XDELETE -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: kubectl/v1.26.9 (linux/amd64) kubernetes/d1483fd" 'https://0.0.0.0:40755/api/v1/namespaces/default/pods/testing'
...
~$ kubectl get pod testing # confirm pod is gone
Error from server (NotFound): pods "testing" not found
~$ kubectl get namespaces # see available namespaces, these are logical groups for access and control in k8s
NAME STATUS AGE
kube-system Active 21m
default Active 21m
kube-public Active 21m
kube-node-lease Active 21m
~$ kubectl -n kube-system get pods #(show control plane and related pods)
NAME READY STATUS RESTARTS AGE
local-path-provisioner-76d776f6f9-zb2nh 1/1 Running 0 21m
coredns-59b4f5bbd5-4hzzl 1/1 Running 0 21m
helm-install-traefik-crd-4lm64 0/1 Completed 0 21m
svclb-traefik-d88ed97a-cnf9n 2/2 Running 0 21m
helm-install-traefik-lwvmg 0/1 Completed 1 21m
traefik-57c84cf78d-zxqmg 1/1 Running 0 21m
metrics-server-68cf49699b-tdm8c 1/1 Running 0 21m
Again, we see "metrics-..." is running, so lets use it.
~$ kubectl top pods -A
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-system coredns-59b4f5bbd5-4hzzl 2m 13Mi
kube-system local-path-provisioner-76d776f6f9-zb2nh 1m 7Mi
kube-system metrics-server-68cf49699b-tdm8c 6m 17Mi
kube-system svclb-traefik-d88ed97a-cnf9n 0m 0Mi
kube-system traefik-57c84cf78d-zxqmg 1m 26Mi
~$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k3d-k8sgpt-cluster-server-0 61m 0% 602Mi 1%
While we see a single node, there are two.
# default install for k3d is two nodes
~$ k3d node list
NAME ROLE CLUSTER STATUS
k3d-k8sgpt-cluster-server-0 server k8sgpt-cluster running
k3d-k8sgpt-cluster-serverlb loadbalancer k8sgpt-cluster running
k3d-k8sgpt-cluster-serverlb purpose is to route traffic versus serve it, use ~$ docker top k3d-k8sgpt-cluster-serverlb
to view processes (nginx).
k8sgpt can be installed via helm.
Helm is the most popular templating installation engine for CN projects. k8sgpt and other tools leverage it to install components.
~$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
~$ chmod u+x get_helm.sh
~$ ./get_helm.sh
...
~$ helm version
...
Next we install the k8sgpt operator via helm. This operator manages the connection to the AI provider (e.g. OpenAI, LocalAI, etc.) and k8s itself.
~$ helm repo add k8sgpt https://charts.k8sgpt.ai/
...
~$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "k8sgpt" chart repository
Update Complete. ⎈Happy Helming!⎈
~$ helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace
...
The previous command did the actual install of the k8sgpt operator.
We can list releases as follows:
~$ helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
release k8sgpt-operator-system 1 2024-03-15 23:40:32.186355959 +0000 UTC deployed k8sgpt-operator-0.1.1 0.0.26
traefik kube-system 1 2024-03-15 23:12:57.963367114 +0000 UTC deployed traefik-21.2.1+up21.2.0 v2.9.10
traefik-crd kube-system 1 2024-03-15 23:12:54.104198207 +0000 UTC deployed traefik-crd-21.2.1+up21.2.0 v2.9.10
Notice k3d leveraged helm (internally) to install components.
Finally, notice this operator works with customer resource definitions (a custom type of resource in k8s).
:~$ kubectl api-resources | grep k8sgpt
k8sgpts core.k8sgpt.ai/v1alpha1 true K8sGPT
results core.k8sgpt.ai/v1alpha1 true Result
If we try to list instances of these resources, we will not find any.
~
$ kubectl get k8sgpts.core.k8sgpt.ai -A
No resources found
~$ kubectl get results.core.k8sgpt.ai -A
No resources found
We can see what is running for k8sgpt as follows.
~$ kubectl get ns
NAME STATUS AGE
kube-system Active 32m
default Active 32m
kube-public Active 32m
kube-node-lease Active 32m
k8sgpt-operator-system Active 4m53s
~$ kubectl get all -n k8sgpt-operator-system
NAME READY STATUS RESTARTS AGE
pod/release-k8sgpt-operator-controller-manager-7597b58757-dl9gv 2/2 Running 0 5m5s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/release-k8sgpt-opera-controller-manager-metrics-service ClusterIP 10.43.73.183 <none> 8443/TCP 5m5s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/release-k8sgpt-operator-controller-manager 1/1 1 1 5m5s
NAME DESIRED CURRENT READY AGE
replicaset.apps/release-k8sgpt-operator-controller-manager-7597b58757 1 1 1 5m5s
Finally, the operator logs.
~$ kubectl -n k8sgpt-operator-system logs release-k8sgpt-operator-controller-manager-7597b58757-dl9gv
2024-03-15T23:40:37Z INFO controller-runtime.metrics Metrics server is starting to listen {"addr": "127.0.0.1:8080"}
2024-03-15T23:40:37Z INFO setup starting manager
2024-03-15T23:40:37Z INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-03-15T23:40:37Z INFO starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0315 23:40:37.524559 1 leaderelection.go:250] attempting to acquire leader lease k8sgpt-operator-system/ea9c19f7.k8sgpt.ai...
I0315 23:40:37.533829 1 leaderelection.go:260] successfully acquired lease k8sgpt-operator-system/ea9c19f7.k8sgpt.ai
2024-03-15T23:40:37Z DEBUG events release-k8sgpt-operator-controller-manager-7597b58757-dl9gv_3f242be8-f527-4ced-a804-4f188c2ee15f became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"k8sgpt-operator-system","name":"ea9c19f7.k8sgpt.ai","uid":"9025d947-8c8a-4bb4-91b1-8daff736a2b1","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1176"}, "reason": "LeaderElection"}
2024-03-15T23:40:37Z INFO Starting EventSource {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "source": "kind source: *v1alpha1.K8sGPT"}
2024-03-15T23:40:37Z INFO Starting Controller {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT"}
2024-03-15T23:40:37Z INFO Starting workers {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "worker count": 1}
We can see its interacting with the K8s API.
In order to use k8sgpt, we need connect the AI.
While k8sgpt can support many AI providers, we choose to use a local and free alternative called LocalAI.
What is LocalAI? Per https://github.com/mudler/LocalAI
The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
LocalAI is part of a larger project called go-skynet (https://github.com/go-skynet).
A helm chart to install LocalAI is provided there (https://github.com/go-skynet/helm-charts#readme).
We now install LocalAI.
~$ helm repo add go-skynet https://go-skynet.github.io/helm-charts/
"go-skynet" has been added to your repositories
~$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "go-skynet" chart repository
...Successfully got an update from the "k8sgpt" chart repository
Update Complete. ⎈Happy Helming!⎈
Due to our setup, we will need to modify our values.yaml to work with k3d.
We start with the default HELM values for LocalAI (found on the website).
~$ vi values.yaml
replicaCount: 1
resources:
{}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
# Prompt templates to include
# Note: the keys of this map will be the names of the prompt template files
promptTemplates:
{}
# ggml-gpt4all-j.tmpl: |
# The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
# ### Prompt:
# {{.Input}}
# ### Response:
# Models to download at runtime
models:
# Whether to force download models even if they already exist
forceDownload: false
# The list of URLs to download models from
# Note: the name of the file will be the name of the loaded model
list:
- url: "https://gpt4all.io/models/ggml-gpt4all-j.bin"
# basicAuth: base64EncodedCredentials
# Persistent storage for models and prompt templates.
# PVC and HostPath are mutually exclusive. If both are enabled,
# PVC configuration takes precedence. If neither are enabled, ephemeral
# storage is used.
persistence:
pvc:
enabled: false
size: 6Gi
accessModes:
- ReadWriteOnce
annotations: {}
# Optional
storageClass: local-path
hostPath:
enabled: false
path: "/models"
service:
type: ClusterIP
port: 80
annotations: {}
# If using an AWS load balancer, you'll need to override the default 60s load balancer idle timeout
# service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "1200"
ingress:
enabled: false
className: ""
annotations:
{}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: chart-example.local
paths:
- path: /
pathType: ImplementationSpecific
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
nodeSelector: {}
tolerations: []
affinity: {}
Now we pull down the Helm chart for LocalAI and make the following modifications.
- set template name
- configure for k3d local-host provisioner
- k3d provisional local-host only allows ReadWriteOnce for disk access mode
~$ helm pull go-skynet/local-ai
~$ ls -l local-ai-3.2.0.tgz
-rw-r--r-- 1 ubuntu ubuntu 5106 Mar 15 23:53 local-ai-3.2.0.tgz
~$ tar xvf local-ai-3.2.0.tgz
local-ai/Chart.yaml
local-ai/values.yaml
local-ai/templates/_helpers.tpl
local-ai/templates/_pvc.yaml
local-ai/templates/configmap-prompt-templates.yaml
local-ai/templates/deployment.yaml
local-ai/templates/ingress.yaml
local-ai/templates/pvcs.yaml
local-ai/templates/service.yaml
# todo fix for proper helm use
~$ helm template ./local-ai -f values.yaml --name-template mlops --debug | sed -e 's/ReadWriteMany/ReadWriteOnce/g' -e 's/hostPath/local-path/g' - > local-ai.yaml
install.go:218: [debug] Original chart version: ""
install.go:235: [debug] CHART PATH: /home/ubuntu/local-ai
The following command is a prime example why CNAI (MLOps) tooling continues to develop. Behind the scenes, the Pod image downloading is 10s of GBs in size.
~$ kubectl apply -f local-ai.yaml
persistentvolumeclaim/mlops-local-ai-models created
persistentvolumeclaim/mlops-local-ai-output created
service/mlops-local-ai created
deployment.apps/mlops-local-ai created
Now would be a good time to get coffee. You can monitor the Pod status as follows (waiting for Running).
# this can take 30 plus minutes
~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mlops-local-ai-7fd46c4d56-fjwc2 0/1 PodInitializing 0 3m23s
Use describe to see what is going on.
~$ kubectl describe pod mlops-local-ai-7fd46c4d56-fjwc2 | tail
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m14s default-scheduler Successfully assigned default/mlops-local-ai-7fd46c4d56-fjwc2 to k3d-k8sgpt-cluster-server-0
Normal Pulling 5m14s kubelet Pulling image "busybox"
Normal Pulled 5m13s kubelet Successfully pulled image "busybox" in 767.213093ms (767.226223ms including waiting)
Normal Created 5m13s kubelet Created container download-model
Normal Started 5m13s kubelet Started container download-model
Normal Pulling 2m49s kubelet Pulling image "quay.io/go-skynet/local-ai:latest"
A command to monitor is use of time+watch.
~$ time kubectl get pods mlops-local-ai-7fd46c4d56-fjwc2 -w
NAME READY STATUS RESTARTS AGE
mlops-local-ai-7fd46c4d56-fjwc2 0/1 PodInitializing 0 8m18s
# Eventually, it will change from PodInitializing to Running.
mlops-local-ai-7fd46c4d56-fjwc2 1/1 Running 0 8m51s
We saw 'Pulling image "quay.io/go-skynet/local-ai:latest"', initially empty, but once far enough downloaded we should see it here.
~$ docker exec k3d-k8sgpt-cluster-server-0 ctr image ls | grep local-ai
quay.io/go-skynet/local-ai:latest application/vnd.oci.image.index.v1+json sha256:7e75efb68e2da5d619648a2e7b163b14a486b24752f5ac312fdc01ae9361401e 15.0 GiB linux/amd64,unknown/unknown io.cri-containerd.image=managed
quay.io/go-skynet/local-ai@sha256:7e75efb68e2da5d619648a2e7b163b14a486b24752f5ac312fdc01ae9361401e application/vnd.oci.image.index.v1+json sha256:7e75efb68e2da5d619648a2e7b163b14a486b24752f5ac312fdc01ae9361401e 15.0 GiB linux/amd64,unknown/unknown
We are now ready to use k8sgpt.
The k8sgpt operator looks for resources of type K8sGPT. The K8sGPT CRD tells the operator to install a given AI and how to find it once its there.
~$ vi k8sgpt-localai.yaml
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-local-ai
namespace: default
spec:
ai:
enabled: true
model: ggml-gpt4all-j
backend: localai
baseUrl: http://mlops-local-ai.default.svc.cluster.local:8080/v1
noCache: false
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.8
Install the K8sGPT CRD.
~$ kubectl apply -f k8sgpt-localai.yaml
k8sgpt.core.k8sgpt.ai/k8sgpt-local-ai created
Does k8sgpt have any advice for us?
~$ kubectl get results.core.k8sgpt.ai
No resources found in default namespace.
Not yet! Looks like are system is good to go! Lets break it.
~$ kubectl run broken-pod --image=nginx:1.a.b.c
pod/broken-pod created
After several minutes.
~$ kubectl get results.core.k8sgpt.ai
NAME KIND BACKEND
defaultbrokenpod Pod localai
It looks like we been given some advice!
~$ kubectl get results defaultbrokenpod -o json
{
"apiVersion": "core.k8sgpt.ai/v1alpha1",
"kind": "Result",
"metadata": {
"creationTimestamp": "2024-03-16T00:29:28Z",
"generation": 1,
"labels": {
"k8sgpts.k8sgpt.ai/backend": "localai",
"k8sgpts.k8sgpt.ai/name": "k8sgpt-local-ai",
"k8sgpts.k8sgpt.ai/namespace": "default"
},
"name": "defaultbrokenpod",
"namespace": "default",
"resourceVersion": "3847",
"uid": "29aac81f-fc40-4ead-9226-7597bd7c2ce6"
},
"spec": {
"backend": "localai",
"details": "",
"error": [
{
"text": "Back-off pulling image \"nginx:1.a.b.c\""
}
],
"kind": "Pod",
"name": "default/broken-pod",
"parentObject": ""
},
"status": {
"lifecycle": "historical"
}
}
Of note, the error text is pretty much to the point (and same as regular errors).
While we can retrieve results via kubectl
, there is also a CLI tool for k8sgpt.
~$ curl -sLO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.27/k8sgpt_amd64.deb
~$ sudo dpkg -i k8sgpt_amd64.deb
...
We can retrieve the results as follows.
~$ k8sgpt analyze -b localai
AI Provider: AI not used; --explain not set
0 default/broken-pod(broken-pod)
- Error: Back-off pulling image "nginx:1.a.b.c"
- Remove pod
- Check that result is gone
- Relaunch with valid tag
- Check again, no new error!
K8sgpt is a an example of CNAI technology, one that helps the operator perform better (and the cluster)!