- Cluster Architecture, Installation & Configuration, 25%
- Workloads & Scheduling, 15%
- Services & Networking, 20%
- Storage, 10%
- Troubleshooting, 30%
- Authentication (certs, password, tokens)
- Authorization
- Admission Control: modules which acts on objects being created, deleted, updated or connected (proxy), but not reads. Can refuse and/or modify the contents of object. Also RBAC is being checked.
RBAC is implemented mostly by leveragin Roles (ClusterRoles), and RoleBindings Docs: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
Are limited only to their namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: rbac-test-role
namespace: rbac-test
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
Has the same powers as roles, but on cluster level. As such it can also manage non namespaced (kubectl api-resources --namespaced=false
) resources, i.e. storageClass
, persistentVolume
, node
,users
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
# "namespace" omitted since ClusterRoles are not namespaced
name: secret-reader
rules:
- apiGroups: [""]
#
# at the HTTP level, the name of the resource for accessing Secret
# objects is "secrets"
resources: ["secrets"]
verbs: ["get", "watch", "list"]
It's what links the Roles/ClusterRoles to the actual objects. RoleBinding can reference the ClusterRole, but it will still be limited only to the namespace of the role itself.
apiVersion: rbac.authorization.k8s.io/v1
# This role binding allows "jane" to read pods in the "default" namespace.
# You need to already have a Role named "pod-reader" in that namespace.
kind: RoleBinding
metadata:
name: read-pods
namespace: default
subjects:
# You can specify more than one "subject"
- kind: User
name: jane # "name" is case sensitive
apiGroup: rbac.authorization.k8s.io
roleRef:
# "roleRef" specifies the binding to a Role / ClusterRole
kind: Role #this must be Role or ClusterRole
name: pod-reader # this must match the name of the Role or ClusterRole you wish to bind to
apiGroup: rbac.authorization.k8s.io
https://kubernetes.io/docs/setup/production-environment/container-runtimes/
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
Exercise: Check how long certificates are valid
- Check how long the kube-apiserver server certificate is valid on cluster2-master1. Do this with openssl or cfssl. Write the exipiration date into
/opt/course/22/expiration
. - Also run the correct
kubeadm
command to list the expiration dates and confirm both methods show the same date. - Write the correct
kubeadm
command that would renew the apiserver server certificate into/opt/course/22/kubeadm-renew-certs.sh
.
on MASTER node
root@cluster2-master1:/etc/kubernetes/pki# ls -l
total 60
-rw-r--r-- 1 root root 1298 Aug 6 08:48 apiserver.crt
-rw-r--r-- 1 root root 1155 Aug 6 08:48 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Aug 6 08:48 apiserver-etcd-client.key
-rw------- 1 root root 1679 Aug 6 08:48 apiserver.key
-rw-r--r-- 1 root root 1164 Aug 6 08:48 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Aug 6 08:48 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1066 May 4 10:48 ca.crt
-rw------- 1 root root 1675 May 4 10:48 ca.key
drwxr-xr-x 2 root root 4096 May 4 10:48 etcd
-rw-r--r-- 1 root root 1078 May 4 10:48 front-proxy-ca.crt
-rw------- 1 root root 1679 May 4 10:48 front-proxy-ca.key
-rw-r--r-- 1 root root 1119 Aug 6 08:48 front-proxy-client.crt
-rw------- 1 root root 1679 Aug 6 08:48 front-proxy-client.key
-rw------- 1 root root 1679 May 4 10:48 sa.key
-rw------- 1 root root 451 May 4 10:48 sa.pub
use openssl to find out the expiration date:
openssl x509 -noout -text -in /etc/kubernetes/pki/apiserver.crt | grep Validity -A2
And we use the (still alpha) feature from kubeadm to get the expiration too:
kubeadm certs check-expiration | grep apiserver
Write command.
# /opt/course/22/kubeadm-renew-certs.sh
kubeadm certs renew apiserver
kubectl get componentstatus
is deprecated as of 1.20. A suitable replacement includes probing the API server directly, For example, on a master node, run curl -k https://localhost:6443/livez?verbose
which returns:
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
.....etc
The topology choices above will influence the underlying resources that need to be provisioned. How these are provisioned are specific to the underlying cloud provider. Some generic observations:
- Disable swap.
- Leverage cloud capabilities for HA - ie using multiple AZ's.
- Windows can be used for worker nodes, but not control plane.
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
Draining a node. kubectl drain (--ignore-daemonsets)
Uncordon kubectl uncordon
Take snapshot
ETCDCTL_API=3 etcdctl snapshot save snapshot.db --cacert /etc/kubernetes/pki/etcd/server.crt --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key
Verify backup
sudo ETCDCTL_API=3 etcdctl --write-out=table snapshot status snapshot.db
Restore ETCDCTL_API=3 etcdctl snapshot restore snapshot.db
Deployments are intended to replace Replication Controllers. They provide the same replication functions (through Replica Sets) and also the ability to rollout changes and roll them back if necessary
kubectl create deployment nginx-deploy --replicas=3 --image=nginx:1.19
To update an existing deployment we can perform either:
- Rolling Update: Pods will be gradually replaced. No downtime, old and new version coexist at the same time.
- Recreate: pods will be deleted and recreated (it will involve downtime)
Check the rollout status
kubectl -n ngx rollout status deployment/nginx-deploy
deployment "nginx-deploy" successfully rolled out
kubectl -n ngx get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deploy 3/3 3 3 44s
Scale number of pods to 2
kubectl scale --replicas 2 deployment/nginx-deploy
Change the image tag to 1.20
kubectl edit deployment nginx-deploy
Verify that the replica set was created
╰─ k get rs
NAME DESIRED CURRENT READY AGE
nginx-deploy-57767fb8cf 0 0 0 4m47s
nginx-deploy-7bbd8545f9 2 2 2 82s
Check the history of deployment and rollback to previous version
k rollout history deployment nginx-deploy ─╯
deployment.apps/nginx-deploy
REVISION CHANGE-CAUSE
1 <none>
2 <none>
k rollout undo deployment nginx-deploy
k rollout undo deployment nginx-deploy --to-revision 5
Create a pod with the latest busybox image running a sleep for 1 hour, and give it an environment variable named PLANET
with the value blue
kubectl run hazelcast --image=busybox:latest --env="PLANET=blue" -- sleep 3600
k exec -it hazelcast -- env | grep PLANET
Create a configmap named space with two values planet=blue and moon=white.
cat << EOF > system.conf
planet=blue
moon=white
EOF
kubectl create configmap space-system --from-file=system.conf
- Mount the configmap to a pod and display it from the container through the path /etc/system.conf
kubectl run nginx --image=nginx -o yaml --dry-run=client > pod.yaml
piVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: nginx
name: nginx
spec:
volumes:
- name: config-volume
configMap:
name: space
containers:
- image: nginx
name: nginx
volumeMounts:
- name: config-volume
mountPath: /etc/system.conf
subPath: system.conf
you can also mount individual keys as from configmaps as env var
apiVersion: v1
kind: Pod
metadata:
name: config-test-pod
spec:
containers:
- name: test-container
image: busybox
command: [ "/bin/sh", "-c", "env" ]
env:
- name: BLOG_NAME
valueFrom:
configMapKeyRef:
name: vt-cm
key: blog
restartPolicy: Never
- Create a secret from 2 files username and a password.
echo -n 'admin' > username
echo -n 'admin-pass' > password
kubectl create secret generic admin-cred --from-file=username --from-file=password
Create a pod with 2 env vars (USERNAME and PASSWORD) and mount secret's values in those vars
apiVersion: v1
kind: Pod
metadata:
name: secret1
spec:
containers:
- env:
- name: USERNAME
valueFrom:
secretKeyRef:
name: admin-cred
key: username
- name: PASSWORD
valueFrom:
secretKeyRef:
name: admin-cred
key: password
image: nginx
name: secret1
**- Mount the secrets to a pod to /admin-cred/
folder and display it. **
apiVersion: v1
kind: Pod
metadata:
name: secret2
spec:
containers:
- image: nginx
name: secret2
volumeMounts:
- name: admin-cred
mountPath: /admin-cred/
volumes:
- name: admin-cred
secret:
secretName: admin-cred
restartPolicy: Always
display
k exec secret2 -- ls -l /admin-cred/
total 0
lrwxrwxrwx 1 root root 15 Aug 4 14:34 password -> ..data/password
lrwxrwxrwx 1 root root 15 Aug 4 14:34 username -> ..data/username
Deployments
facilitate this by employing a reconciliation loop to check the number of deployed pods matches what’s defined in the manifest. Under the hood, deployments leverage ReplicaSets, which are primarily responsible for this feature.
If deployment is using a volume, then multiple pods will have access to the same volume (no isolation), and as consequence there will be data race in case of multiple writes.
Stateful Sets
are similar to deployments, for example they manage the deployment and scaling of a series of pods. However, in addition to deployments they also provide guarantees about the ordering and uniqueness of Pods. A StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
StatefulSets are valuable for applications that require one or more of the following.
- Stable, unique network identifiers.
- Stable, persistent storage.
- Ordered, graceful deployment and scaling.
- Ordered, automated rolling updates.
Each pod created by the StatefulSet has an ordinal value (0 through # replicas - 1) and a stable network ID (which is statefulsetname-ordinal) assigned to it
At a namespace level, we can define resource limits.
This enables a restriction in resources, especially helpful in multi-tenancy environments and provides a mechanism to prevent pods from consuming more resources than permitted, which may have a detrimental effect on the environment as a whole.
We can define the following:
- Default memory / CPU requests & limits for a namespace. If a container is created in a namespace with a default request/limit value and doesn't explicitly define these in the manifest, it inherits these values from the namespace
- Minimum and Maximum memory / CPU constraints for a namespace. If a pod does not meet the range in which the constraints are valued at, it will not be scheduled.
- Memory/CPU Quotas for a namespace. Control the total amount of CPU/memory that can be consumed in the namespace as a whole.
Exercise:
- Create a new namespace called "tenant-b-100mi"
- Create a memory limit of 100Mi for this namespace
- Create a pod with a memory request of 150Mi, ensure the limit has been set by verifying you get a error message.
kubectl create ns tenant-b-100mi
Create limit
apiVersion: v1
kind: LimitRange
metadata:
name: tenant-b-memlimit
namespace: tenant-b-100mi
spec:
limits:
- max:
memory: 100Mi
type: Container
Deployment
apiVersion: v1
kind: Pod
metadata:
name: default-mem-demo
namespace: tenant-b-100mi
spec:
containers:
- name: default-mem-demo
image: nginx
resources:
requests:
memory: 150Mi
It should give
The Pod "default-mem-demo" is invalid: spec.containers[0].resources.requests: Invalid value: "150Mi": must be less than or equal to memory limit
Kustomize Helm ???
Each host is responsible for one subnet of the CNI range. In this example, the left host is responsible for 10.1.1.0/24, and the right host 10.1.2.0/24. The overall pod CIDR block may be something like 10.1.0.0/16.
Virtual ethernet adapters are paired with a corresponding Pod network adapter. Kernel routing is used to enable Pods to communicate outside the host it resides in.
Every Pod gets its own IP address which is shared between it's containers.
Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):
- Pods on a node can communicate with all pods on all nodes without NAT
- Agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
Note: When running workloads that leverage hostNetwork
: Pods in the host network of a node can communicate with all pods on all nodes without NAT
Exercise:
- Deploy the following manifest
- Using
kubectl
, identify the Pod IP addresses - Determine the DNS name of the service.
kubectl apply -f https://raw.githubusercontent.com/David-VTUK/CKAExampleYaml/master/nginx-svc-and-deployment.yaml
kubectl get po -l app=nginx -o wide
$ k get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-service ClusterIP 172.20.252.175 <none> 80/TCP 2m54s
$ k run --restart=Never --image=busybox --rm -it busybox -- nslookup 172.20.252.175
Server: 172.20.0.10
Address: 172.20.0.10:53
175.252.20.172.in-addr.arpa name = nginx-service.exam-study.svc.cluster.local
pod "busybox" deleted
Since pods are effemeral, we need to have an entrypoint in front of them (service). They (services) can take following forms:
- ClusterIP : not exposed, internal only
- LoadBalancer: External, requires cloud provider or custom software implementation (I.e. metalLb), unlikely to come on CKA
- NodePort: External, requires access to the nodes directly, ports starts from 32000
- Ingress Resource: L7 An Ingress can be configured to give services externally-reachable URLs, load balance traffic, terminate SSL, and offer name based virtual hosting. An Ingress controller is responsible for fulfilling the Ingress, usually with a loadbalancer, though it may also configure your edge router or additional frontends to help handle the traffic. Note: ingress usually uses one of the services to provide the routing to the pods.
Exercise
- Create three
deployments
of your choosing - Expose one of these deployments with a service of type
ClusterIP
- Expose one of these deployments with a service of type
Nodeport
- Expose one of these deployments with a service of type
Loadbalancer
- Note, this remains in
pending
status unless your cluster has integration with a cloud provider that provisions one for you (ie AWS ELB), or you have a software implementation such asmetallb
- Note, this remains in
kubectl create deployment nginx-clusterip --image=nginx --replicas 1
kubectl create deployment nginx-nodeport --image=nginx --replicas 1
kubectl create deployment nginx-loadbalancer --image=nginx --replicas 1
kubectl expose deployment nginx-clusterip --type="ClusterIP" --port="80"
kubectl expose deployment nginx-nodeport --type="NodePort" --port="80"
kubectl expose deployment nginx-loadbalancer --type="LoadBalancer" --port="80"
Ingress exposes HTTP and HTTPS routes from outside the cluster to services within a cluster. Ingress consists of two components. Ingress Resource is a collection of rules for the inbound traffic to reach Services. These are Layer 7 (L7) rules that allow hostnames (and optionally paths) to be directed to specific Services in Kubernetes. The second component is the Ingress Controller which acts upon the rules set by the Ingress Resource, typically via an HTTP or L7 load balancer. It is vital that both pieces are properly configured to route traffic from an outside client to a Kubernetes Service
kubectl create ingress ingressName --class=default --rule="foo.com/bar=svcName:80" -o yaml --dry-run=client
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingressName
spec:
ingressClassName: default
rules:
- host: foo.com
http:
paths:
- backend:
service:
name: svcName
port:
number: 80
path: /bar
pathType: Exact
Exercise
Create an ingress
object named myingress
with the following specification:
- Manages the host
myingress.mydomain
- Traffic to the base path
/
will be forwarded to aservice
calledmain
on port 80 - Traffic to the path
/api
will be forwarded to aservice
calledapi
on port 8080
kubectl create ingress myingress --rule="myingress.mydomain/=main:80" --rule="myingress.mydomain/api=api:8080"
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myingress
spec:
rules:
- host: myingress.mydomain
http:
paths:
- backend:
service:
name: main
port:
number: 80
path: /
pathType: Exact
- backend:
service:
name: api
port:
number: 8080
path: /api
pathType: Exact
As of 1.13, coredns has replaced kube-dns as the facilitator of cluster DNS and runs as pods.
Check configuration of a pod:
kubectl run busybox --image=busybox -- sleep 9000
kubectl exec -it busybox sh
/ # cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local virtualthoughts.co.uk
options ndots:5
- Pods- [Pod IP separated by dashes].[Namespace].pod.cluster.local
- Services - [ServiceName].[Namespace].svc.cluster.local
apiVersion: v1
kind: Service
metadata:
name: test-headless
spec:
clusterIP: None
ports:
- port: 80
targetPort: 80
selector:
app: web-headless
Headless services are those without a cluster ip, but will respond with a list of IP’s of pods that are applicable at that particular moment in time. This is useful if your app needs to obtain (through dns) list of all pod ips running a particular service.
apiVersion: v1
kind: Pod
metadata:
namespace: default
name: dns-example
spec:
containers:
- name: test
image: nginx
dnsPolicy: "None"
dnsConfig:
nameservers:
- 8.8.8.8
searches:
- ns1.svc.cluster.local
- my.dns.search.suffix
options:
- name: ndots
value: "2"
- name: edns0
Coredns config can be found in a config map
kubectl get cm coredns -n kube-system -o yaml
Exercise:
- Identify the configuration location of
coredns
- Modify the coredns config file so DNS queries not resolved by itself are forwarded to the DNS server
8.8.8.8
- Validate the changes you have made
- Add additional configuration so that all DNS queries for
custom.local
are forwarded to the resolver10.5.4.223
kubectl get cm coredns -n kube-system
NAME DATA AGE
coredns 2 94d
kubectl edit cm coredns -n kube-system
replace:
forward . /etc/resolv.conf
with
forward . 8.8.8.8
Add custom block
custom.local:53 {
errors
cache 30
forward . 10.5.4.223
reload
}
You must deploy a Container Network Interface (CNI) based Pod network add-on so that your Pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed.
https://kubernetes.io/docs/concepts/cluster-administration/addons/#networking-and-network-policy
https://kubernetes.io/docs/concepts/storage/storage-classes/
A StorageClass
provides a way for administrators to describe the "classes" of storage they offer
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: localdisk
reclaimPolicy: Delete
allowVolumeExpansion: true
provisioner: kubernetes.io/no-provisioner
allowVolumeExpansion
: if not set to true, it will be impossibile to resize the pv
persistentVolumeReclamePolicy
:
- Retained: On deletion of pv, data is not deleted. Manual intervention is required for the storage release
- Recycled (deprecated)
- Deleted: delete both k8s object and cloud volume as well. (works only on clouds)
A persistentvolume
object can be used to request storage from a storageclass
and is typically part of a pod manifest. It usually specifies the capacity for the storage, access mode, and the type of volume
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
storageClassName: "localdisk"
persistentVolumeReclaimPolicy: Delete
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /var/output
Only 2 exists:
- block: Mounted to a pod as a raw block device without a filesystem. The Pod / application needs to understand how to deal with raw block devices. Presenting it in this way can yield better performance, at the expense of complexity.
- filesystem: Mounted inside a pods' filesystem inside a directory. If the volume is backed by a block device with no filesystem, Kubernetes will create one. Compared to
block
devices, this method offers the highest compatibility, at the expense of performance.
Three options exist:
ReadWriteOnce
– The volume can be mounted as read-write by a single nodeReadOnlyMany
– The volume can be mounted read-only by many nodesReadWriteMany
– The volume can be mounted as read-write by many nodes
A PersistentVolume
can be thought of as storage provisioned by an administrator. Think of this as pre-allocation
A PersistentVolumeClaim
can be thought of as storage requested by a user/workload.
To use pv
(which are abstract), we leverage persistedVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
storageClassName: localdisk
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
When pvc
is created, it will look for an available pv
which will satisfy it's criteria, if found it will be bounded
to it.
Once a PV is bound to a PVC, that PV is essentially tied to the PVC’s project and cannot be bound to by another PVC. There is a one-to-one mapping of PVs and PVCs. However, multiple pods in the same project can use the same PVC
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: task-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
---
apiVersion: v1
kind: Pod
metadata:
name: task-pv-pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
If leveraging a storageclass, a persistentvolume
object is not required, we just need a persistentvolumeclaim
Exercise:
In this exercise, we will not be using storageClass
objects
- Create a
persistentVolume
object of typehostPath
with the following parameters:- 1GB Capacity
- Path on the host is /tmp
storageClassName
is ManualaccessModes
isReadWriteOnce
- Create a
persistentVolumeClaim
to the aforementionedpersistentVolume
- Create a
pod
workload to leverage this `persistentVolumeClaim
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/tmp"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: task-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
name: task-pv-pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/output"
name: task-pv-storage
- Node Status
Check the status of nodes. All of them must be in
READY
state.
kubectl get nodes
kubectl describe nodeName
- Services Verify that services (kubelet, docker) are up and running
systemctl status kubelet
systemctl start kubelet
systemctl enable kubelet
ALWAYS check that services are up, running, and ENABLED
- System pods
If the cluster has been build with
kubeadm
, there are several pods which must be running in kube-system namespace.
kubectl get pods -n kube-system
kubectl describe pod podName -n kube-system
you can check logs for k8s components with journalctl
sudo journalcl -u kubelet/docker
(shift+G to jump to the end of the file)
You can also check for /var/log/kube-*.log
, but with kubeadm cluster they wont be stored on the filesystem but on the system pods (available with kubectl logs xxx
)
Container logs: kubectl logs [-f] podName [-c containerName
Through componentstatuses (deprecated but still working)
kubectl get componentstatuses
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
or
./etcdctl cluster-health
member 17f206fd866fdab2 is healthy: got healthy result from https://master-0.etcd.cfcr.internal:2379
/etc/kubernetes/manifest/ Examples of misconfigurations
- kube-apiserver isn’t pointing to the correct mountPath or hostPath for certificates
- Pods are getting scheduled because kube-scheduler.yaml is pointing to the wrong image
- etcd isn’t working because there’s inconsistent TCP/IP ports
- Container command sections have typos or not pointing to the correct TCP/IP port or configuration/certificate file path
At a cluster level, kubectl get events
provides a good overview.
Kubernetes handles and redirects any output generated from a containers stdout and stderr streams. These get directed through a logging driver which influences where to store these logs. Different implementations of Docker differ in exact implementation (such as RHEL's flavor of Docker) but commonly, these drivers will write to a file in json format:
This is a somewhat ambitious topic to cover as how we approach troubleshooting application failures varies by the architecture of that application, which resources/API objects we're leveraging, if the application contains logs. However, good starting points would include running things like:
kubectl describe <object>
kubectl logs <podname>
kubectl get events
Pods
and Services
will automatically have a DNS record registered against coredns
in the cluster, aka "A" records for IPv4 and "AAAA" for IPv6. The format of which is:
pod-ip-address.my-namespace.pod.cluster-domain.example
my-svc-name.my-namespace.svc.cluster-domain.example
To test resolution, we can run a pod with nslookup
to test
Mainly covered earlier in acquiring logs for the CNI. However, one issue that might occur is when a CNI is incorrectly, or not initialised. This may cause workloads to enter a pending
status:
kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 0/1 Pending 0 57s <none> <none> <none> <none>
kubectl describe <pod>
can help identify issues with assigning IP addresses to nodes from the CNI
kubectl completion bash /etc/bash_completion.d/kubectl
alias k=kubectl
export do=--dry-run -o yaml
- List item
- Etcd backup/restore
https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/
kubeadm simplifies installation of kubernetes cluster
sudo tee -a /etc/modules-load.d/containerd.conf <<EOF overlay br_netfilter EOF
sudo modprobe overlay sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF sudo sysctl --system
apt-get update && apt-get install -y containerd
mkdir -p /etc/containerd/ containerd config default | sudo tee /etc/containerd/config.toml
swapoff -a sudo sed -i '/ swap / s/^(.*)$/#\1/g' /etc/fstab
apt install -y apt-transport-https curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update sudo apt-get install -y kubelet=1.20.1-00 kubeadm=1.20.1-00 kubectl=1.20.1-00 sudo apt-mark hold kubelet kubeadm kubectl
Master
kubeadm init --pod-network-cidr=192.168.0.0/16 kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
kubeadm token create --print-join-command
Manages, plan, schedule, monitor of nodes
- Etcd cluster
- kube-apiserver
- kube controller manager
- kube-scheduler
- kubelet
- kube-proxy
- Container runtime engine
ETCD is a distributed reliable key-value store that is simple, secure & Fast. It stores information regarding the cluster such as Nodes
, PODS
, Configs
, Secrets
, Accounts
, Roles
, Bindings
and Others
.
Stacked etcd
= etcd running on the same node as control plane
If you are using kubeadm
, you will find etcd pods in kube-system
namespace
You can run etcdctl
commands directly inside etcd pod
kubectl exec etcd-master -n kube-system etcdctl get / --prefix -key
Kube-apiserver is responsible for authenticating
, validating
requests, retrieving
and Updating
data in ETCD key-value store. In fact kube-apiserver is the only component that interacts directly to the etcd datastore. The other components such as kube-scheduler, kube-controller-manager and kubelet uses the API-Server to update in the cluster in their respective areas
if using a cluster deployed by the kubeadm, the configuration for kube-api server is located in a static manifest /etc/kubernetes/manifests/kube-apiserver.yaml
Manages several controllers in cluster.
In case of kubeadm cluster, config is located in /etc/kubernetes/manifests/kube-controller-manager.yaml
Monitors status of replicaset ensuring the desired number of pods
The kube-scheduler is only responsible for deciding which pod goes on which node. It doesn't actually place the pod on the nodes, that's the job of the kubelet
The kubelet works in terms of a PodSpec. A PodSpec is a YAML or JSON object that describes a pod. The kubelet takes a set of PodSpecs that are provided through various mechanisms (primarily through the apiserver) and ensures that the containers described in those PodSpecs are running and healthy. The kubelet doesn't manage containers which were not created by Kubernetes.
Kubeadm doesn't not deploy kubelet by default, we must manually download and install it.
The Kubernetes network proxy runs on each node. This reflects services as defined in the Kubernetes API on each node and can do simple TCP, UDP, and SCTP stream forwarding or round robin TCP, UDP, and SCTP forwarding across a set of backends
Upgrade cluster apt-get install -y --allow-change-held-packages kubelet=1.20.2-00 kubectl=1.20.2-00 kubeadm upgrade plan v1.20.2
kubeadm upgrade node
Backing up etcd with etcdctl
etcd snapshot restore
creates a new logical cluster
verify the connectivity ETCDCTL_API=3 etcdctl get cluster.name --endpoints=https://10.0.1.101:2379 --cacert=/home/cloud_user/etcd-certs/etcd-ca.pem --cert=/home/cloud_user/etcd-certs/etcd-server.crt --key=/home/cloud_user/etcd-certs/etcd-server.key
ETCDCTL_API=3 etcdctl snapshot restore=backup.db --initial-cluster="etcd-restore=https://10.0.1.101:2380" --initial-advertise-peer-urls https://10.0.1.101:2380 --name etcd-restore --data-dir /var/lib/etcd
chown -R etcd:etcd /var/lib/etcd/
quick creation of yaml kubectl create deployment my-dep --image=nginx --dry-run -o yaml
--record flag stores the kubectl command used as an annotation on the object
--- RBAC Role / ClusterRole = objects defining set of permissions RoleBinding / ClusterRoleBindings
service account = account used by container processes within pods to authenticate with k8s api we can bind service accounts with cluster roles/cluster role bindings
-- kubernetes metric server is an optional addon kubectl top pod --sort-by xxx --selector kubectl top pod --sort-by cpu kubectl top node
raw access kubectl get --raw /apis/metrics.k8s.io
configmap and secrets can be passed to containers as a env var or configuration volume, in that case each top-level key will appear as a file containing all keys below that top-level key
apiVersion: v1 kind: Pod metadata: name: env-pod spec: containers:
- name: busybox
image: busybox
command: ['sh', '-c', 'echo "configmap: $CONFIGMAPVAR secret: $SECRETVAR"']
env:
- name: CONFIGMAPVAR valueFrom: configMapKeyRef: name: my-configmap key: key1
- name: SECRETVAR valueFrom: secretKeyRef: name: my-secret key: secretkey1
Resource requests allow you to define an amount of resources (cpu/memory) you expect a container to use. The scheduler will use that information to avoid scheduling on nodes which do not have enough available resources. ONLY affects the scheduling cpu is expressed in 1/1000 of cpu. 250m = 1/4 cpu containers:
- name: nginx resources: requests: xxx limits: cpu: 250m memory: "128mi"
liveness / readiness probe startupProbe
-- nodeselector - my label spec: nodeselector: keylabel: "value"
spec: nodeName: "nodename"
static pod = automatically created from yaml manifest files localted in the manifest path of the node. mirror pod = kubeclet will create a mirror pod for each static pod to allow to see the status of the static pod via the api, but you cannot manage them through the api, it has to be managed directly through the kubelet
kubeadm default location /etc/kubernetes/manifests/
Deployment scalement
- change replica attribute in the yaml
- kubectl scale deployment.v1.apps/my-deployment --replicas=5
to check the status of a deployment
you can change the image
network policy = an object that allows you to control the flow of network communication to and from pods. It can be applied to ingress and/or egress
By default, pods are wide open. But if there is any policy attached, they are isolated and only whitelisted traffic is allowed
Available selectors:
- podSelector
- namespaceSelector
- ipBlock
port
kubectl label namespace np-test team=tmp-test
Pod domain names are of the form pod-ip-address.namespace-name.pod.cluster.local.
Service is an abstraction layer permitting to our clients to interact with the service without need of knowing anything about underlying pods Service routes traffic in load-balance manner Endpoints are the backend entities, to which services route traffic. There is 1 endpoint for each pod
Service Types: ClusterIP - Exposes applications inside the cluster networks NodePort - Esposes application outside the cluster network LoadBalancer - Exposes application to outside through the usage of cloud load balancer ExternalName (*not in cka)
Service FQDN Dns service-name.namespace.svc.cluster-domain.example This fqdn can be used from any namespace, in the same namespace you can simply use the short svc name
Volume types
- hostPath
- emptyDir
it can be mounted in a pod like a normal volume
apiVersion: v1
kind: Pod
metadata:
name: pv-pod
spec:
containers:
- name: busybox
image: busybox
command: ["sh","-c", "while true; do echo Success! >> /output/success.txt;sleep 5; done"]
volumeMounts:
- mountPath: "/output"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: my-pvc
pvc can be extended
, as long as storageClassName
property allowVolumeExpansion
allows it
-
Connection refused (kubectl) If you cannot connect to the kube api server, it might be down. Verify that kubelet and docker services are up and running on control plane nodes.
-
Node Status Check the status of nodes. All of them must be in
READY
state.
kubectl get nodes
kubectl describe nodeName
- Services Verify that services (kubelet, docker) are up and running
systemctl status kubelet
systemctl start kubelet
systemctl enable kubelet
ALWAYS check that services are up, running, and ENABLED
- System pods
If the cluster has been build with
kubeadm
, there are several pods which must be running in kube-system namespace.
kubectl get pods -n kube-system
kubectl describe pod podName -n kube-system
you can check logs for k8s components with journalctl
sudo journalcl -u kubelet/docker
(shift+G to jump to the end of the file)
You can also check for /var/log/kube-*.log
, but with kubeadm cluster they wont be stored on the filesystem but on the system pods (available with kubectl logs xxx
)
Container logs: kubectl logs [-f] podName [-c containerName
You can run any command inside the pod by using kubectl exec podName [-c containerName] -- command
You can open a new session with
kubectl exec -it podName [-c containerName] -- bash
Force Killing pods
kubectl delete pod podName --force --grace-period=0
Check kube-proxy
and dns
pods in kube-system namespace
Useful container image for debugging: nicolaka/netshoot
Cpu?
#########################
kubectl run nginx --image=nginx --restart=Never
kubectl delete po nginx --grace-period=0 --force
k get po redis -w
kubectl get po nginx -o jsonpath='{.spec.containers[].image}{"\n"}'
kubectl run busybox --image=busybox --restart=Never -- ls
kubectl logs busybox -p # previous logs
k run --image busybox busybox --restart=Never -- sleep 3600
kubectl get pods --sort-by=.metadata.name
kubectl exec busybox -c busybox3 -- ls
kubectl get pods --show-labels
kubectl get pods -l env=dev
kubectl get pods -l 'env in (dev,prod)'
k create deploy deploy1 --image=nginx -oyaml --dry-run=client
k run tmp --rm --image=busybox -it -- wget -O- google.com
#####
kubectl run --image nginx --restart=Never mypod
kubectl create deployment my-dep --image=nginx --replicas=2 --port=80
kubectl expose pod mypod--port 80 --target-port 80
kubectl expose deployment my-dep