- Automatically deploying and managing container is called container orchestration
- K8s is an container orchestration tool/technology
- Other alternatives of K8s are docker swarm and mesos
- K8s cluster is a set of machines (or nodes) running in sync
- One of the node is master node, responsible for actual orchestration
kube-scheduler
schedules pods on nodes based on node capacity, load on node and other policies. This runs inkube-system
namespacekubelet
runs on worker node which listens for instructions from kube-apiserver and manages containerskube-proxy
enables communication within services within the clusterkubectl
tool is used to deploy an manage applications on k8s clusters- As k8s is container orchestration tool so we also need one container runtime engine like docker
- ETCD is a distributed reliable key-value store that is simple, secure and fast
- K8s uses etcd cluster in master node to store information like which node is master, nodes, pods, configs, secrets, accounts, roles, bindings and other information
etdctl
is a command line too which comes with ETCD server
./etcdctl set key1 value1
./etcdctl get key1
- All the
kubectl get
command output comes from etcd server
kube-apiserver
is used for all management related communication in a cluster. It runs on master node- When we run a from
kubectl
, it reaches kube-api server which authenticates and validates the request and then interacts with etcd server and returns back the response - We don't necessarily need to use kubectl, we can directly make requests like POST request using curl to create a pod
- Controller consistently monitors the state of the system and takes necessary action to bring back the system to normal state in case of problems
- Kube controller interacts with
kube-apiserver
to get cluster info. This also runs on master node and this is the brain of the cluster - There are many controller in k8s, listed below are 2 of those
- Node controller: Monitors which pod is down/unhealthy then takes necessary action to launch a new one if down
- Replication controller: Ensures desired number of pods are running at all times within a set
- Decides which pod goes to which node so that right container ends up on right node. It decides on basis of node CPU/mem available and required by container and finds best fit
- This also runs on master node
- This runs on worker nodes and responsible to getting gathering commands from kube apiserver and sending back the all reports for that worker node
kubelet
needs to installed manually on worker nodes, this not installed automatically withkubeadm
like other components
- This runs as daemonset on each node
- Manages networking in k8s cluster so that each pod in a cluster is able to communicate every other pod
- POD is a single instance of an application
- We add single container in a pod - this is recommendation but pod can have multiple containers in some cases like a container may have some helper containers which may go in same pod. Containers running in same pod can communicate using
localhost
itself
- Install
kubectl
utility first to interact with k8s cluster
minikube
is the easiest way to install k8s cluster, it installs all components (etcd, container runtime, ...) on a single machine/node
minikube start # Start minikube
minikube stop # Stop minikube
minikube service appname-service --url # Get external URL of appname
kubeadm
is more advanced tool to create multi-node k8s cluster- We can use tool like
vgrant
to create VMs on machine to have multiple nodes in k8s - master and worker(s)
- K8s works on yaml files, it expects 4 top level attributes
- apiVersion
- Kind
- metadata
- spec
- Below is sample yaml file to deploy a pod with
nginx
container
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx
tier: frontend
spec:
containers:
- name: nginx
image: nginx
- Controllers are the brain behind k8s cluster, they are the process which monitors k8s objects and takes desired action
- Replication controller ensures, specified number of pods are running at all times. Also helps for load balancing across pods - scaling
kind = ReplicationController
, pod and replicas info is present inspec
section of yaml file
apiVersion: v1
kind: ReplicationController
metadata:
name: ...
labels:
...
spec:
template:
<pod-definition>
replicas: ...
- This serves the same purpose as replication controller but it's an older technology. Replica sets is the recommended way
apiVersion = apps/v1
andkind = ReplicaSet
andspec
section remains same as above and one more params calledselector
used to select pods for replication. Selector is used to match which pods to monitor and it can be possible that pods with given labels already exists (or some exists) then replica set won't create those pods but just monitor those to have desired number of pods
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: ...
labels:
...
spec:
template:
<pod-definition>
replicas: ...
selectors:
matchLabels:
...
- Provides capability for rolling updates, rollback, pausing and resume changes
- Deployments come higher in the heirarchy than replica sets (pod > replica sets > deployments)
- yaml file is almost same as of replicasets but
kind = Deployment
for deployment object - On deploying, it creates a new replica set which in turn creates pods
apiVersion: apps/v1
kind: Deployment
metadata:
name: ...
labels:
...
spec:
template:
<pod-definition>
replicas: ...
selectors:
matchLabels:
...
- Default ns is automatically created when a cluster is setup
kube-system
(for networking like DNS and security) andkube-public
(for keeping public resources) are other ns created at cluster startup- Each ns has
- Isolation: Each ns is isolated from other, we can have a cluster with 2 ns
dev
andprod
. These 2 will be isolated from each other. We can access resources/services deployed in other ns using ns with service name like web-app.dev.svc... - Policies: Each ns has different policies
- Resource limits: We can define different quota of resources in each ns
- Isolation: Each ns is isolated from other, we can have a cluster with 2 ns
- It is a k8s virtual object which enables communication between various internal and external components like access from browser, b/w frontend and backend services
- Enables loose coupling in our microservices in application
We have 3 types of services in k8s
- This service is used to make internal service(like webserver) accessible to the users(outside world) on a port. It exposes application on a port on all hosts
- Has range of ports from 30000 to 30767
apiVersion: v1
kind: Service
metadata:
...
spec:
type: NodePort
ports:
- targetPort: 80
port: 80
nodePort: 30008
selectors:
...
- This file has 3 ports - ports are wrt to service
- targetPort: Pod port, this is actual port to access inside port
- port: Service port
- nodePort: Port exposed to external world
- For multiple pods are running with given
selector
, it acts as load balancer and distributes traffic to various pods randomly - For multiple nodes in cluster, we can access application using any of the node port IP and port, nodePort service created spans across nodes in the cluster
- Creates an IP (and name) to communicate b/w services like from set of frontend and backend services. This is for internal access only(within cluster, not bound to specific node), different microservices communicate using ClusterIP service
- This is the default service type
- K8s creates one ClusterIP service by default named
kubernetes
apiVersion: v1
kind: Service
metadata:
name: backend
...
spec:
type: ClusterIP
ports:
- targetPort: 80
port: 80
selectors:
...
- Imperative way to create service
# Expose pod `messaging` on running on 6379 to 6379
# We can use deployment, rc or rs instead of pod
# We can also specify `targetPort` if want some other external port
k expose pod messaging --name messaging-service --port=6379
- Used to create single endpoint like http://some-domain.com to access the application. The application may have multiple nodes running, the will help us create a common name to access it. Without this we will have to access the apps using specific
nodeIP:port
which is very hard to remember and will change when node restarts (may get new IP)
apiVersion: v1
kind: Service
...
spec:
type: LoadBalancer
ports:
- targetPort: 80
port: 80
nodePort: 30008
selectors:
...
- Using
LoadBalancer
in cloud providers likeAWS
,GCP
, k8s sends the request to cloud provider to provision a load balancer which can be used to access the application
- Providing instructions writing in english to do something
- In k8s, anything done using
kubectl
command exceptapply
is imperative approach likekubectl run, edit, expose, create, scale, replace, delete, ...
- This is faster, we just have to run the right command -
yaml
file not always required. Use this in certification exam to save time
-
Using tools like terraform, chef, puppet, ansible. This does lot of error handling and maintains state of steps done so far
-
In k8s, done using
kubectl apply
command, checks for what is the state of the system and performs relevant action only -
It is recommended not to mix imperative and declarative approaches
- Each Pod is assigned with an IP
- K8s does not handle networking to communicate b/w pods so in multi node cluster, we has to setup the networking on our own using other networking softwares like vmware nsx, etc.
- Scheduler assigns node for a pod, when we deploy a pod, property called
nodeName
(in spec section) is assigned to pod which has node name where this pod has to run - If pod doesn't get a
nodeName
assigned to it, pod remains inPending
state - We can also assign
nodeName
manually - by using thisnodeName
property set with our deployment yaml file - Note that we can't change the
nodeName
of a running instance of pod, to mimic this behaviour we useBinding
object and send a POST request for this pod
- We can taint certain nodes so that only specific pods can be scheduled on those nodes. This is useful when we want to use some nodes for specific use case
- For those specific pods which should be scheduled on tainted nodes we add tolerations for those pods which makes pods tolerant to the taint and gets scheduled on tainted nodes
- Below command can be used to taint a node
k taint nodes node-name key=value:taint-effect # Sample
k taint nodes node1 app=blue:NoSchedule # Example
-
<taint-effect>
specifies what happens to pod which do not tolerate this taint, it can have 3 valuesNoSchedule
Don't schedule those pods on this nodePreferNoSchedule
System will try to avoid scheduling on this node but that's not guaranteedNoExecute
Don't schedule pods and existing pods which don't tolerate the taint will be evicted. This is possible if some pods got scheduled on nodes before they were tainted
-
We can add tolerations to pods in yaml definition file in spec section
...
spec:
...
tolerations:
- key: app
operator: Equal
value: blue
effect: NoSchedule
- When we create a cluster, taint is applied on master node so that no pod(workload) is scheduled on master nodes. Can be checked using below command
k describe node <master-node-name> | grep Taint
- Untaint node
kubectl taint nodes <nodeName> node-role.kubernetes.io/master:NoSchedule-
- Tainting nodes only restricts nodes from allowing certain pods to be scheduled on those nodes but it doesn't guarantee that a specific pod gets scheduled on specific node. A tolerant pod can be scheduled on any node in the system. If we have requirement to schedule some pods on specific nodes, this can be achieved using
node affinity
- To schedule a pod on specific node we can use
nodeSelector
in spec section of pod definition yaml file
...
spec:
...
nodeSelector:
size: Large
- Size given in above command is the label that we has to add on nodes using below command
k label nodes node-1 size=Large
- Node selector have limitation like it doesn't support complex selection filters like schedule pod on medium or large nodes or don't schedule on small nodes, for these use cases we can use Node Affinity
- We can add
affinity
in spec section of pod yaml definition file, to select specific nodes for scheduling pods
...
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: size
operator: In
values:
- Large
- Medium
- Other operators we can use are
Exists
(doesn't need a value),NotIn
, etc - Other node affinity
preferredDuringSchedulingIgnoredDuringExecution
andpreferredDuringSchedulingRequiredDuringExecution
- Scheduling: Starting pod - assigning a node to pod
- Execution: Pod is already scheduled and in running state. Considered when pod is already running on a node and someone changes the node labels
- When scheduler tries to schedule a pod, k8s checks for pod's resource requirements and places on node which has sufficient resources
- By default container requests for 0.5 CPU and 256 Mi of RAM for getting scheduled, this can be modified by adding
resources
section under spec of pod yaml definition
...
spec
...
resources:
requests:
memory: "1Gi"
cpu: 1
- 1 CPU = 1000m = 1 vCPU = 1 AWS vCPU = 1 GCP core = 1 Azure core = 1 Hyperthread. m is millicore
- It can as low as 0.1 which is 100m
- For memory
- 1 K (kilobyte) = 1,000 bytes
- 1 M = 1,000,000 bytes
- 1 G = 1,000,000,000 bytes
- 1 Ki (kibibyte) = 1,024 bytes
- 1 Mi = 1,048,576 bytes
- ...
- While container is running it's resource requirements can go high so by default k8s sets a limit of 1 vCPU and 512 Mi to containers, this can also be changed by adding
limits
section underresources
section
...
spec:
resources:
...
limits:
memory: "2Gi"
cpu: 2
- If container tries to use more CPU then limits, then it is throttled and in case if memory exceeds container is terminated
- Daemon sets ensures that one copy of a Pod is always running in all nodes in the cluster, when a new node is added to cluster daemon set pod starts running on that node
- Some application of using daemon set
- Monitoring solution
- Logs viewer
kube-proxy
runs as daemon set- YAML definition of daemon set is similar to replica sets, change is in the kind only, other params are same
apiVersion: apps/v1
kind: DaemonSet
...
spec:
...
template:
...
- K8s (v1.12 onwards) uses
nodeAffinity
and default scheduler to deploy daemon sets in each node
- Suppose we don't have master node in cluster (which
kube-api
server, etcd server and other things), we only have worker nodes which haskubelet
now we can't create resources as we don't have kube-api server which can give instructions to kubelet to create anything - In this scenario we place
pod
definition yaml files atpod-manifests-path
which is by default/etc/kubernetes/manifets
, andkubelet
checks this path and creates any pod if it finds at this path, if later pod definition file is deleted pod also gets deleted. This way of creating pods is calledstatic pods
kubelet
only understandspod
so we are only able to create only pods and not deployments or replica setspod-manifests-path
orstaticPodPath
can be updated while runningkubelet service
. To know current path check-config
option used inkubelet
binary running (ps -eaf | grep kubelet
),-config
contains kubeconfig file having other details- Now static pod is created but we can't use
kubectl get pods
to check pods because itkubectl
interacts with kube-api server which is not running so in this case we can use docker commandsdocker ps
- This is used to deploy control plane components while creating k8s server - etcd, api-server, controller-manager, scheduler (these pods have
nodeName - master or controlplane
appended to there name)
- Apart from default k8s cluster scheduler running on master node, we can also deploy our own scheduler
- Custom scheduler can be deployed just like any other pod with some name - image should be
k8s.gcr.io/kube-scheduler:v1.20.0
, has a command section which contains various options likeleader-elect
,port
, ... - While deploying other service pod we can specify our custom scheduler using
schedulerName
option in spec section
- K8s has monitoring server called Metrics server which keeps all cluster metrics, this is an in memory solution so we won't get historical data
Kubelet
running on each node has another component calledcAdvisor
or container advisor which retrieves performance metrics from pods and exposes them to metrics server through kubelet APIs- We can install metric server using
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
- To view metrics, we can use below
top
commands
k top node # See CPU and memory usage of all nodes
k top pod # See CPU and memory usage of pods in current namespace
- To view pod logs
k logs <podName> # Get all logs from begining to now
k logs -f <podName> # Stream logs
# If a pod has multiple containers in it, specfify container name also
k logs -f <podName> <containerName>
# Check logs from all containers in a pod
k logs -f <podName> --all-containers
# Previous pod logs
k logs -f <podName> --previous
# If core components are down then kubectl commands won't work, we can get use journalctl to see logs of components like kubelet
journalctl -u kubelet # Get kubelet logs
# We also use docker commands to get logs if kubectl is not working
docker logs <docker-id>
- When a deployment is created, a rollout is triggered and creates a revision of deployment. On every subsequent deployment this revision if updated, this helps us to rollback to a previous version if necessary
- To see rollouts, use below commands
k rollout status deployment/myapp-deployment # Get status of rollout
k rollout history deployment/myapp-deployment # Get rollout history
- K8s support 2 deployment strategies
- Recreate: Delete all exiting deployments and then create new ones. This will cause some downtime and all new may have some issues. This is NOT the default deployment strategy
- Rolling update: This deletes one object (or pod) at a time and deploy newer version one by one. This way application never goes down and upgrade is seamless. This is the default deployment strategy in k8s
- These 2 strategies can also be seen by describing a deployment, we will get the strategy name and how pods got updated - rolling or recreate
- Under the hood, deployment creates a replica set and creates required number of pods, when deployment is updated - new replica set is created and all new pods are started in it and stopping pods from existing replica sets. To see use
k get replicasets
- If we a notice problem after upgrading our application/deployment, we can undo the deployment and rollback to previous revision, it will destroy the pods in new replica set and bring older ones up in old replica set
k rollout undo deployment/myapp-deployment # Rollback last deployment
- In
Dockerfile
we have 2 fieldsENTRYPOINT
: Specifies which command to runCMD
: Takes arguments which goes with above command given in entrypoint
- In pod definition file, we can overwrite both the above options using
command
andargs
option respectively
apiVersion: v1
kind: Pod
metadata:
name: ubuntu-sleeper
spec:
containers:
- name: ubuntu-sleeper
image: ubuntu-sleeper
command: ["my-sleep"] # Run this command (overwritten `ENTRYPOINT`)
args: ["10"] # Pass argument 10 with above command (overwritten `CMD`)
We can specify environment variables for a pod in it's definition file using env
or envFrom
parameter. There are 3 ways get value for env vars
env:
- name: APP_COLOR
value: pink
ConfigMaps are used to save all configurations required by application at central place, this can be referred in pod definition file and all name/value will be available
- ConfigMaps can be created using imperative way or declarative way
# Imperative approach
k create configmap <cm-name> --from-literal=<key1>=<value1> --from-literal=<key2>=<value2>
k create configmap <cm-name> --from-file=<path-to-file> # Can use file with all key/val also
k get configmaps
k describe configmaps
- Declarative approach
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
APP_COLOR: blue
APP_MODE: prod
- To use above configMap in pod definition file use
envFrom
envFrom:
- configMapRef:
name: app-config # ConfigMap name
- Above config injects all env vars from configMap to pod, we can also take selective
env:
- name: APP_COLOR
valueFrom:
configMapKeyRef:
name: app-config
key: APP_COLOR
- ConfigMaps can also be mounted to volumes
volumes:
- name: app-config-volume
configMap:
name: app-config
Can be used to store any sensitive information. Same as configMaps but data is stored in encoded format. Secrets data is kept in encoded format base64
- Imperative way to create secret
k create secret generic <secret-name> --from-literal=<key1>=<value1> --from-literal=<key2>=<value2>
k create secret generic <secret-name> --from-file=<path-to-file> # Can use file with all key/val also
k get secrets
k describe secrets
- Declarative way: To use declarative way values should be
base64
encoded, if we don't want to encode tobase64
we can usestringData
field instead ofdata
apiVersion: v1
kind: Secret
metadata:
name: app-secret
data:
DB_Host: bXlzcWwK
DB_User: cm9vdAo=
DB_Passowrd: YWJjMTIzCg==
- To use above secret in pod definition file use
envFrom
envFrom:
- secretRef:
name: app-secret # secret name
- Above config injects all env vars from secret to pod, we can also take selective
env:
- name: APP_COLOR
valueFrom:
secretKeyRef:
name: app-secret
key: DB_Passowrd
- Secrets can also be mounted to volumes. This mounting creates files in container for each parameter one file is created.
volumes:
- name: app-secret-volume
secret:
name: app-secret
# 3 files are created corresponding to 3 vars in secret
ls /opt/app-secret-volumes
DB_Host DB_Passowrd DB_User
- There can be cases when we need 2 services to work together - scale up/down, share same network (can be accessed using localhost), share same volume. Example would be web server and a logging service
- Use 2 containers defined in
containers
section of spec
...
spec:
containers:
- name: sample-app
image: sample-app:1.1
- name: logger
image: log-agent:1.5
- 3 multi container pods design patterns [discussed in CKAD course]
- sidecar: For example using logging service with app container
- adapter
- ambassador
- Init containers are used for doing some task before actual container starts like some other task is done or checkout some source code from repository. This executes only once at the beginning
- Similar to containers but defined under
initContainers
section in spec section - it is a list so can have multiple init containers and it executes in sequence as defined - If init container fails whole pod is restarted
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']
- If a node is down, then k8s waits for
node eviction timeout
(default=5 mins) before scheduling pods on that node to other nodes. In cases when node is down and comes up immediately, pods are scheduled on the same node - For maintenance purpose we can drain all pods on a node to get scheduled on other nodes. Pods which are not managed by
deployment
orreplicaSets
are lost and not scheduled on other nodes (this is warned and can deleted using--force
option)
k drain <node-name>
k drain <node-name> --ignore-daemonsets
- When node back again, we can
uncordan
to make this node available for scheduling new pods
k uncordon <node-name>
- There is another command
cordon
which makes a node un-schedulable for new pods however existing pods remain running on that node
k cordon <node-name>
# This gives client and server versions
# client = kubectl version
# server = kubernetes version
k version
k version --short
k get nodes # Also gives kubelet version running
- Version = x.y.x x = Major version y = Minor version z = Patch version
kube-apiserver
is the main component in k8s cluster, if this is at version X (minor version) thencontroller-manager
andkube-scheduler
can at max one version lower than X- And
kubelet
andkube-proxy
can be at max 2 versions lower than X - None of them can have higher version than X
- However
kubectl
can have version between X - 1 to X + 1
- At a given point in time, k8s community supports latest 3 versions (minor)
- 2 steps in upgrading cluster
- First upgrade
master nodes
: While master node upgrade is in process, workloads on worker nodes will continue to work but management functions won't like we can't create or delete a pod or if pod crashes it won't be rescheduled - Then
worker nodes
: We have 3 strategies for this- Upgrade all worker nodes at the same time - requires downtime
- Upgrade one node at a time - kind of rolling upgrade
- Add new nodes with upgraded version then remove existing nodes
- First upgrade
- Recommended approach is to upgrade one minor version at a time - not to skip the versions
kubeadm upgrade plan
# This command does not upgrade kubelet, we has to upgrade kubelet by going(ssh) on each nodes and upgrading
kubeadm upgrade apply <version>
- Follow this for complete steps: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
- Note: While upgrading, all
kubectl
commands should be run on control plane nodes and NOT in worker nodes even when we are upgrading worker nodes
We need to backup below componets in a cluster
- Resource configs: Take backup of all resources deployed (either using imperative or declarative way).
- We can use below command to get all resources deployed
k get all --all-namespaces -o yaml > all-resources.yaml
- There are other solutions already build for this to take backup of all resources like
velero
by heptIO
- ETCD cluster: Stores state of the cluster, this also runs as a static pod on master nodes
- Taking backup of ETCD also gives all resource information
- We can take backup of ETCD using
etcdctl
command
# trusted-ca-file, cert-file and key-file can be obtained from the description of the etcd Pod ETCDCTL_API=3 etcdctl --endpoints=https://<IP>:2379 \ --cacert=<trusted-ca-file> --cert=<cert-file> --key=<key-file> \ snapshot save <backup-file-location> # To restore this snapshot # 1 - we can first stop kube api server service kube-apiserver stop # 2 - then restore from backup ETCDCTL_API=3 etcdctl restore <snapshot-name>.db --data-dir=/var/lib/etcd-from-backup # 3 - update etcd with new path, etcd is static pod so update manifests file (by default here - /etc/kubernetes/manifests/etcd.yaml) # 4 - reload service daemon and restart etcd service systemctl daemon-reload service etcd restart # 5 - start kube apiserver service kube-apiserver start
- Volumes
All communication between various k8s components are TLS based
kube-apiserver
serves all requests to the cluster so this is responsible to authenticating the requests. User can send request usingkubectl
command orcurl
- Authentication can be done below methods, it is configured while starting kube-apiserver
- Basic authentication
- Static password file: Using username and password from csv file. While requesting using
curl
we can use-u
option to specify username/password - Static token file: Instead of using password, keeping token in file. This token can be sent in
header of HTTP request
- Note: Both above method are not recommended as they are not secure, so we use certificate based authentication
- Static password file: Using username and password from csv file. While requesting using
- Symmetric encryption: Same key is used for encryption and decryption. Problem is sharing that key b/w client and server securely
- Asymmetric encryption: Uses 2 keys
- Public key: For encryption
- Private key: For decryption
ssh
also uses asymmetric encryption -ssh-keygen
generates public and private keys. Private key is used to login to the server and public key is used to lock the access to severs
- Key exchange - PKI (Public key infrastructure)
- Server shares public key (certificate) to client
- Client generates encryption key and sends back to server - this key is encrypted using public key which can only be decrypted using server using private key
- Now both client and server has exchanged encryption key securely which can be used to encrypt further messages
- Domain authorization
- With public key (from server to client), a digital certificate is also sent which is signed/approved/authorized by a certificate authority (CA) to confirm that domain is actually what is says - like xyz.com is actually xyz.com and not someone else with fraud identity. Some popular CA are symantec, digicert, globalsign, ...
- Domain owner has to generate a certificate signing request (CSR) and sent to CA, then CA verifies all details and sends back signed certificate
- How CAs are validated? Each CA also have a pair of public and private key (this is called root certificates) and they sign the certificates using private key and there public keys are stored in each clinet like browsers so from there client verifies certificate is signed by authorized by CA
- For interval usage, we can host our own CA also and sign certificates
- Naming conventions
- Public key(certificate): *.crt, *.pem
- Private key: *.key, *-key.pem
- Note: Private key can also be used to encrypt data and can be decrypted using public key, this is never done because anyone having public key will be able to decrypt it
- Everything mentioned above is to verify if we are communicating to right server or not using it's certificate, there can be cases when server also needs to verify if it is communicating to correct client and can ask for client certificate from client
Bases on interaction, we can have server and client components in k8s. Each component will have it's own certificate
- Server
- kube-apiserver: apiserver.crt, apiserver.key
- etcd: etcdserver.crt, etcdserver.key
- kubelet: kubelet.crt, kubelet.key
- Client: All below clients talks to
kube-apiserver
- User(admin): admin.crt, admin.key
- kube-scheduler: scheduler.crt, scheduler.key
- kube-controller-manager: controller-manager.crt, controller-manager.key
- kube-proxy: kube-proxy.crt, kube-proxy.key
- We also need at least one CA to generate certificates for all above components which also has certificates - ca.crt, ca.key
- Generate CA self signed certificates - root certificates
# 1. Generate keys
openssl genrsa -out ca.key 2048
# 2. Certificate signing request
openssl req -new -key ca.key -subj "/CN=KUBERNETES-CA" -out ca.csr
# 3. Sign certificates
openssl x509 -req -in ca.csr -signkey ca.key -out ca.crt
# Now for all other certificates, we will use this key pair to sign them
- Generate certificates for other components and sign using above CA - like admin user certificate
# 1. Generate keys
openssl genrsa -out admin.key 2048
# 2. Certificate signing request
openssl req -new -key admin.key -subj "/CN=kube-admin" -out admin.csr
openssl req -new -key admin.key -subj "/CN=kube-admin/O=system:masters" -out admin.csr # Admin user
# 3. Sign certificates - using CA key pair
openssl x509 -req -in ca.csr -CA ca.crt -CAkey ca.key -out admin.crt
- Now we have admin user certificate, we can use this in 3 ways
- curl command
curl https://kube-apiserver:6443/api/v1/pods \ --key admin.key --cert admin.crt --cacert ca.crt
- kubectl command
kubectl get pods \ --server kube-apiserver:6443 \ --client-key admin.key --client-certificate admin.crt --certificate-authority ca.crt
- Specifying certificates in each command is not very handy so add these information to
kubeconfig
file, then specify this file with command
kubectl get pods --kubeconfig ~/.kube/<config-name> # By default kubeconfig file used is ~/.kube/config
- Note: Each component should have root certificate file (ca.crt) present with them
- We should know how cluster is setup, like if cluster is setup using
kubeadm
then all certificates are placed at/etc/kubernetes/pki/
- If we want to know the details from a components certificate, we can use below command - will print details like
expiry
,issuer
,alternate names
, ...
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout
# Decode CSR file
openssl req -in filename.csr -noout -text
kubeadm
tool creates a pair of CA keys (public and private keys) and places them on master node so master is becomes our CA server. All new CSR will go to master to getting signed- When a new user wants to access the cluster he can create a CSR and send to admin, admin will then creates a CSR object using yaml manifests file -
kind: CertificateSigningRequest
...
kind: CertificateSigningRequest
...
spec:
...
request:
<base64 encoded CSR>
- Now admin can use kubectl commands to view/approve CSRs
k get csr # Get list of all CSRs
k certificate approve <name> # Approve CSR
k get csr <name> -o yaml # Gives user certificate in base64 format
- On master node all certificate related operations is taken care by
controller-manager
- hascsr-approving
andcsr-signing
controllers. - To sign the CSR, controller manager should have root certificates (CA key pairs) - while starting controller manager is accepts root certificates in
--cluster-signing-cert-file
and--cluster-sigining-key-file
- kubeconfig file has 3 sections
- Clusters: List of cluster (dev, prod) with CA root certificates -
ca.crt
- Users: List of users (admin, readonly) with certificate key pairs (crt and key)
- Contexts: Combination of above 2 - List of cluster and users like which cluster to user with which user - readonly@prod, admin@dev, ... At the top level of config file, we also have a default context to use if we don't explicitly chose one
- Clusters: List of cluster (dev, prod) with CA root certificates -
k config view # See current kubeconfig file
k config use-context prod@readonly # Change current context. This command updates the `current-context` field in kubeconfig file
# Use some other kubeconfig (default is ~/.kube/config)
export KUBECONFIG=/path/my-kube-config
# Set default context of given kubeconfig to context-1
k config --kubeconfig=/path/my-custom-config use-context context-1
- We can also set the
namespace
incontext
section of kubeconfig file to point a specific namespace, by default it is pointed todefault
ns
# Set context to dev, from next commands we don't have specify ns name in commands
k config set-context --current --namespace=dev
- To debug problems with
kubeconfig
file, we can usecluster-info
command
# Use current kubeconfig
k cluster-info
# Use custom kubeconfig
k cluster-info --kubeconfig=/path/to/kubeconfig
- Objects in k8s are categorised in different API groups
- /metrics: Getting metrics
- /healthz: Get health information
- /version: Get cluster version
- /api: Interact with various core resources like pods, configMaps, namespace, etc.
- /apis: Named APIs, further categorized into below API groups
- /apps: /v1/deployments, /v1/replicasets, /v1/statefulsets
- /extensions
- /networking.k8s.io: /v1/networkpolicies
- /storage.k8s.io
- /authentication.k8s.io
- /certificates.k8s.io: /v1/certificatesigningrequests
- /logs: For fetching logs
- Verbs are operation of API groups like
get
,list
,update
, ... - To list all API groups we can do a curl on cluster domain name
curl http://<api-server>:6443
# Above command will fail we haven't specified the certificates so we can use `kubectl` to start a proxy client which will take certs from `kubeconfig` and run on localhost
kubectl proxy
Starting to serve on 127.0.0.1:8001
# Now we can access cluster using curl command via this proxy - will use credentials from kubeconfig and forward request to api server
curl http://localhost:8001 # List all API groups
curl http://localhost:8001/version
curl http://localhost:8001/api/v1/pods
- Once user/machine gains access to cluster what all things it can do is defined by authorization
- Authorization mechanisms
- Node: Used by agents inside cluster like
kubelet
, these requests are authorized by Node authorizer. In certificates if name hassystem
likesystem:node
then these are system components and authorized using node authorizer - ABAC: Attribute based access control, for external access
- This associates user(s) to a set of permissions
- We can create these policy using
kind: Policy
- Managment is harder because we has to update policy for each user when required to update permissions
- RBAC: Role based
- Instead of user(s) <> permission mapping, we create a role like
developer
,security-team
and role has set of permissions then associate user to role
- Instead of user(s) <> permission mapping, we create a role like
- Webhook: Outsource authorization to other tools like
open policy agent
- Node: Used by agents inside cluster like
- We can provide
authorization-mode
in kube-apiserver (by default it isalways-allow
), it can have multiple values like Node,RBAC,Webhook - For access, check is made against all values specified till access if granted to chain ends
- To create a role, we create
Role
object. Inrule
section, we can add various access permissions. This has scope of namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: developer
namespace: testing
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["list", "get", "create", "update", "delete"]
- apiGroups: [""]
resources: ["ConfigMaps"]
verbs: ["create"]
- Link user(s) to role - using
RoleBinding
object
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: devuser-developer-binding
subjects:
- kind: User
name: dev-user
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: developer
apiGroup: rbac.authorization.k8s.io
- We can also check, we user(self, other) has access to perform some operation
k auth can-i create deployments
k auth can-i delete nodes
k auth can-i create pods --as dev-user
# Can dev-user has permission to create pod in test namespace
k auth can-i create pods --as dev-user --namespace test
- We can also give access to specific resources, using
resourceName
field in rules
...
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "create", "delete"]
resourceName: ["blue", "green"]
- Imperative ways
k create role pod-reader --verb=get --verb=list --verb=watch --resource=pods
k create rolebinding pod-reader-binding --clusterrole=pod-reader --user=bob --namespace=acme
- Resources in k8s can be namespaced(pods, rs, cm, roles) or cluster scoped(nodes, clusterroles) - can get whole list using
k api-resources --namespaced=true # Get all namespaced resources
k api-resources --namespaced=false
clusterrole
andclusterrolebindings
has cluster scope (remember role had ns scope) - this role created has cluster level access
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-administrator
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list", "get", "create", "delete"]
- Link user(s) to cluster role - using
ClusterRoleBinding
object
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cluster-admin-role-binding
subjects:
- kind: User
name: cluster-user
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: cluster-administrator
apiGroup: rbac.authorization.k8s.io
- Imperative ways
k create clusterrole pod-reader --verb=get,list,watch --resource=pods
k create clusterrolebinding pod-reader-binding --clusterrole=pod-reader --user=root
- Although clusterRole is cluster scoped but we can create it for namespaced resources also - will access on all namespace for that object. For example if we create
clusterRole
forpods
, then this role will have to pods across all namespaces
- 2 types of accounts
- User: used by humans like Admin, developer
- Service: used by machined like build tools, prometheus
k create serviceaccounts dashboard-sa k get serviceaccounts k describe serviceaccounts dashboard-sa
- Imperative way
# Grant read-only permission within "my-namespace" to the "my-sa" service account
k create rolebinding my-sa-view \
--clusterrole=view \
--serviceaccount=my-namespace:my-sa \
--namespace=my-namespace
- Service account has a token which is used by any third party service to access cluster(kube-apiserver using
curl
command), this token is kept as secret. This sa can now be associated with a role using RBAC for specific access - If third party service is running in cluster itself like as a pod then we can mount this secret as volume and then pod can access it directly - use
serviceAccountName
field inspec
section - A default service account is also created in each ns and moounted with each pod if don't specify any other
automountServiceAccountToken: false
- don't mount service account token with pod
- When we specify image in pod definition file, it follows docker naming convention -
image: nginx
actually becomesimage: docker.io/library/nginx
wheredocker.io
is the default registry to look for imagelibrary
is default user/accountnginx
is the repository name for image
gcr.io
is another public registry where all k8s related images stored, for end to end testinggcr.io/kubernetes-e2e-test-image/dnsutils
- Public cloud providers also has container registry service like
ECR
is by AWS - Private repository: Store images which are not public, requires some credentials to access - using
docker login
docker login private-registry.io
docker run private-registry.io/apps/internal-app
- For using private registry in pod definition file, we has to create secret of type
docker-registry
and specify name in pod definition
k create secret docker-registry regcred \
--docker-server=private-registry.io \
--docker-username=registry-user \
--docker-password=registry-password \
[email protected]
...
kind: Pod
spec:
containers:
image: private-registry.io/apps/internal-app
imagePullSecrets:
- name: regcred
...
- Security context can be set at the pod and/or container level
- Pod level: Applies to all containers defined in this pod definition
... spec: securityContext: runAsUser: 1000 # Default is root, skip this if want to run as root containers: ...
- Container level: Applies to specfic container. Note if applied at both pod and container level, container level is applicable
... spec: containers: securityContext: runAsUser: 1000 capabilities: add: ["MAC_ADMIN"] ...
- We can also set container
capabilities
which can only be at container level (as in above example)
- 2 types of traffic - Ingress and Egress
- Ingress: Traffic coming into the server/network
- Egress: Going out of server/network
- Replying back to client does not matter - doesn't require egress configuration, this is enabled by default
- Ingress or egress is always looked from that specific server perspective - like for DB we only require ingress traffic
- K8s is by default configured with "All allow" means, any pod can communicate with any other pod/service within the cluster - using pod IP, name, etc.
- To restrict traffic we apply network policies to pod, this is done using selectors using labels and using it in
NetworkPolicy
object. Below example shows to apply network policy ondb
so that onlyapi-pods
can connect todb
on port 3306 - this will restrict others like web server pods from accessing db
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: db-policy
spec:
podSelector:
matchLabels:
role: db
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
name: api-pod
ports:
- protocol: TCP
port: 3306
- Network policies are enforced by networking solutions implemented on k8s cluster and not all networking solutions support network policies. Solutions that support are -
kube-router
,calico
,romana
,weave-net
- We can further filter down on whom to allow with namespace filter
...
ingress:
- from:
- podSelector:
matchLabels:
name: api-pod
namespaceSelector:
matchLabels:
name: prod
...
- For situations like allowing backup server which is not deployed as cluster in pod we can allow specific IP address also
...
ingress:
- from:
- podSelector:
matchLabels:
name: api-pod
- ipBlock:
cidr: 192.168.5.10/32
...
- For configuring
egress
-from
in ingress becomesto
and rest remains same
- Intially k8s only used to work with docker runtime and it's code was also embedded into k8s but with other container runtimes coming in (like rkt, crio), docker was moved out of k8s and
container runtime interface
which developed - CRI governs the interface that when a new runtime is developed, how it will communicate with k8s so that k8s don't have to change to support it
- Similar to CRI, container networking interface(CNI) and container storage interface(CSI) is developed. CSI is a standard followed by storage drivers to work with any orchestration tool, some of storage drivers are
portworx
,Amazon EBS
, etc. - CSI defines set of RPCs (like
createVolume
,deleteVolume
) which will be called orchestrator and must be implemented by these storage drivers
- Like in containers, pod data is also gets deleted when a pod is deleted so to persist the data, we use volumes and mounts
- We can attach volume to pod using
volumeMounts
to refer one ofvolumes
created
apiVersion: v1
kind: Pod
metadata:
name: random-num
spec:
containers:
- image: alpine
name: alpine
command: ["/bin/sh", "-c"]
args: ["shuf -i 0-100 -n 1 >> /opt/number.out;]
volumeMounts:
- mountPath: /opt
name: data-volume
volumes:
- name: data-volume
hostpath:
path: /data
type: Directory
- Now pod
/opt
maps to host/data
directory whatever pod writes on path/opt
will be present on host/data
directory even if pod dies - This approach is not recommended if we have multi node cluster because directory will be specific to node so we use external storage solutions like
NFS
,AWS EBS
, etc and specific option instead ofhostpath
, for example forAWS EBS
, we useawsElasticBlockStore
...
volumes:
- name: data-volume
awsElasticBlockStore:
volumeID: <volume-id>
fsType: ext4
- In above section, we saw how volumes can be created the problem is it created with each pod definition. If we have lot of pods, it is hard to add/manage
volumes
with each pod so we create aPersistentVolume
and use it with pods usingPersistentVolumeClaim
to claim the volumes persistent
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-vol1
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
awsElasticBlockStore:
volumeID: <volume-id>
fsType: ext4
- Admin creates PV and user creates PVC to use the storage
- When PVC is created is gets maps to one of the PV which matches the PVC claim criteria. If user want to bind to specific PV - can provide additional filters also
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Mi
- We can now use
PVC
with pod (or replicasets, deployments) definition
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: myclaim
- We cannot delete PVC if it used by any pod - if we try it will be in
terminating
state until pod is deleted
- Before creating PV, we must create volume in provider we are using like with AWS, we must provision EBS first before PV - this is called static provisioning
- To solve above dependency we use
storageClasses
which takes the provider name and createsPV
automatically for us like on AWS or GCP - this is dynamic provisioning
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: google-storage
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
replication-type: regional-pd
- Then in PVC, we can refer this storage using
storageClassName: google-storage
inspec
section and rest PVC definition remains same
- To connect 2 hosts, we need to connect boths hosts using swtich using host's interface. Using
ip link
command we can check interface(s) on host - Router connects 2 switches(networks) and creates. Router IP is the first one in the network
- We can have several routers, so hosts should know to send a packet to a host in other network which router to use - for this we use gateways (if host is a room then gateway is the door). To configure a gateway on a host, we can use below command
# To reach any IP in network 192.168.2.0/24 use gateway(router) address 192.168.1.1
# Route should be added on all hosts to send packets to hosts on other n/w
ip route add 192.168.2.0/24 via 192.168.1.1
# We can add default route for all other IPs/NW which we don't know
# Any IP for which explicit route is not added, use 192.168.1.1
ip route add default via 192.168.1.1
# See routes added on host
ip route show
route
- Using host as router: Linux by default doesn't forward packets received on one interface to other, this is disabled for security reasons. We can enable it using
echo 1 > /proc/sys/net/ipv4/ip_forward
- Above setting is not retained across reboots, we can set
net.ipv4.ip_forward=1
in/etc/sysctl.conf
file
- We can add custom IP to hostname mapping
/etc/hosts
file. This translation of hostname to IP is known as Name resolution - Managing host/IP mapping like above is hard when number of hosts increases (and IPs of host can also change) so we use
DNS
server for this and configure host to point to this DNS server for host to IP lookup - IP of DNS server can be added in
/etc/resolv.conf
file with fieldnameserver
so when host doesn't know IP of a host it goes to this DNS server to get IP of a host - If entry for same hostname is present in
/etc/hosts
and nameserver(DNS sever) both then host first checks local/etc/hosts
file if not found then goes to DNS server configured. This ordering can also be changed using/etc/nsswitch
file - For public internet hosts like (google.com, fb.com, ...) we can configure global DNS servers like
8.8.8.8
(by google) to check for host IPs this can be added to/etc/hosts
file or configure our local DNS server to check at8.8.8.8
if not found - We can add another entry called
search
in/etc/hosts
file which appends domain name with host we want to search like
...
search mycompany.com
...
# If we ping `gitlab`, it will change the domain name to `gitlab.mycompany.com` automatically if it exists
# We can have list in search to have multiple items
- Record types
- A: Maps IP to hostnames
- AAAA: Maps IPv6 to hostnames
- CNAME: Maps one name to another name (like fb.com is same as facebook.com)
- Tools
- ping: Simple, gives IP in ping traces
- nslookup: Resolves using DNS server, it doesn't take into account local /etc/hosts mappings
- dig: More detailed
- Using hosts as DNS: We have various tools for this
coreDNS
is one of those. This runs on port53
, which is the default port of DNS server
Refer this section: https://gist.github.com/hansrajdas/d950ffd99c3ae817b08fd11592dc82eb#docker-networking
- In a k8s cluster we can have multiple nodes - master and workers with unique IPs and mac addresses, below are some ports required to be open for each component in a cluster
- ETCD(on master node): Port 2379, all control plane components connect to
- ETCD(on master node): Port 2380, is only for etcd peer-to-peer connectivity
- kube-api(on master node): Port 6443
- kubelet(on master and worker node): Port 10250
- kube-scheduler(on master and worker node): Port 10250
- kube-controller-manager(on master node): Port 10252
- services(on master node): Port 30000-32767
- NOTE: If things are not working, all ports are one the first things to verify
- K8s don't have any networking solution but it requires that each IP gets an unique IP address and every pod is reachable from every other pod in a cluster (with multi node also) without having to configure any NAT rules. In smaller nodes with couple of nodes, we can configure networking/routing using scripts but for large cluster it becomes hard to manage so in for those we use networking solutions(plugins) available that does this like weaveworks, flannel, cilium, vmware nsx
- We can specify the CNI/network-plugin options in
kubelet
component using below args
...
--network-plugin=cni \
--cni-bin-dir=/opt/cni/bin \
--cni-conf-dir=/etc/cni/net.d \
...
- weavework agent runs on each node and communicate with each other regarding the nodes, networks and pod. Each agent stores the topology of the entire setup and know pods and there IPs on other nodes
- weave creates its own bridge on each node and names it
weave
and assigns IP address to each n/w - Deployed as daemon set to run on each node
- CNI plugin (like weave) assigns IPs to pods. In CNI config file
/etc/cni/net.d/net-script.conf
we specifyIPAM
configuration, subnets, routes, etc. - Weave creates interface on each host with name
weave
, useifconfig
command to check - Weave default subnet is
10.32.0.0/12
which is 10.32.0.1 to 10.47.255.254, around 1,048,574 IPs for pods
- For services refer this section. This section discusses about service networking
kube-proxy
runs on each node which listens for changes from kubeapi server and everytime a new service is to created kube-proxy gets into action and assigns IP to the service. Unlike pod service spans across cluster- kube-proxy creates routing rules corresponding to each service created, in this routing rule port is also present like if packet comes on IP:PORT forward it to POD-IP. This routing can be set using 3 ways -
userspace
,iptables(default)
,ipvs
, this can be configured by setting--proxy-mode
in kube-proxy config - Service IP range is configured in
kube-api-server
kube-api-server --service-cluster-ip-range ipNet # Default 10.0.0.0/24
# We can see the rules from NAT tables using iptables
iptables -L -t nat | grep <service-name>
# Check kube-proxy logs for routing created and mode/proxier used
cat /var/log/kube-proxy.log
- k8s deploys a built in DNS server by default when we setup is a cluster
- All pods and services are reachable using IP address within the cluster
- For each service k8s creates a DNS record by default which maps service name to service IP. Within same namespace, we can access the service using service names. From other namespace, we has to specify namespace also
- All service names are sub domain under domain namespace name
- All namespaces are sub domain under service
svc
- All
svc
are sub domain under root domain calledcluster.local
by default
service name: web-service
namespace: apps
# Within same namespace
curl http://web-service
# From other namespaces, we can use any
curl http://web-service.apps
curl http://web-service.apps.svc
curl http://web-service.apps.svc.cluster.local # FQDN
- DNS records for pods are not created by defualt but we can enable that, once enabled it's entry is made with dots replaced in IP with
-
to IP and not pod name to IP. If pod IP is1.2.3.4
then entry would be1-2-3-4
maps to1.2.3.4
curl http://1-2-3-4.apps.pod.cluster.local
- Initial k8s DNS component was
kube-dns
but afterv1.12
k8s recommended to usecoreDNS
coreDNS
is deployed as a replicaSet in cluster and takes a config using configMap, coreDNS config on host is placed at/etc/coredns/Corefile
. coreDNS watches for any new service or pod (if enabled in coreDNS config file) created and adds an entry in its database- To access coreDNS, a service is also created with name
kube-dns
. Pods are configured (by kubelet) to have kube-dns IP innameserver
field in file/etc/resolv.conf
. This file also hassearch
fields to make FQDN from only service-name or service.namespace
- K8s object which acts as application load balancer(Layer 7) - directs request to different services based on URL path
- This becomes single where SSL can be implemented - independent of all services
- Ingress deployment - we need two things
- Ingress controller: This is one of the third party solution like
nginx
,HA proxy
, etc. K8s doesn't come with any default ingress controller so has to install one. We will usenginx
as an example and see what all objects are required to deploynginx
igress controller- Deployment: Image used will modified version of
nginx
:quay.io/kubernetes-ingress-controller/nginx_ingress_controller
- Service: Of type
NodePort
with selector of above ingress controller - ConfigMap: To store nginx config data
- ServiceAccount: To access all objects - role, clusterBinding, roleBinding
- Deployment: Image used will modified version of
- Ingress resources: Configuration rules on ingress controller to route traffic to specific service based on URL like
p1.domain.com
should go top1
service,p2...
to p2 ordomain.com/p1
to p1, and so on. This rsource is created using below definition file
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: ingress-wear spec: backend: serviceName: wear-service # Route all traffic to wear service servicePort: 80
- We can define
rules
(with paths) in ingress resources to map traffic from different URLs to specific service
... spec: rules: - http: paths: - path: /wear backend: serviceName: wear-service servicePort: 80 - path: /watch backend: serviceName: watch-service servicePort: 80
- We can define
rules
(with host) in ingress resources to map traffic from different subdomains to specific service
... spec: rules: - host: wear.my-online-store.com http: paths: - backend: serviceName: wear-service servicePort: 80 - host: watch.my-online-store.com http: paths: - backend: serviceName: watch-service servicePort: 80
- Imperative way of creating ingress resources
kubectl create ingress <ingress-name> --rule="host/path=service:port" # Example kubectl create ingress ingress-test --rule="wear.my-online-store.com/wear*=wear-service:80"
- Ingress controller: This is one of the third party solution like
- Designing a cluster would depend on what is the purpose of it, based on the purpose we can have design in different ways
- minikube: Used to deploy single node cluster very easily. This provisions a VM and then runs k8s
- kubeadm: Used to deploy multi node cluster. This expects VMs are already provisioned
- There no solution available on windows to use k8s, we has to provision a linux based VM on windows to use k8s
- For HA cluster, we use multiple master nodes, which is backed by a load balancer which directs the requests to one of the master nodes. Master nodes has below components running
- API server: All API servers are active on all master nodes
- Controller manager(replication & node): Only one is active others are on standby, active is elected using leader election
- Scheduler: Only one is active others are on standby, active is elected using leader election
- ETCD: It is distributed system so API server can reach to any of the ETCD instance running for read or write
- ETCD runs with master node and generally on master node but for complex(and HA) clusters we can run ETCD on separate nodes and connect to master nodes
- We can run cluster on-prem or cloud. In cloud, we have option to self manage cluster or use managed solutions like EKS(AWS), GKE(GCP), ...
- ETCD is a distributed, reliable key value store that is simple, secure and fast
- Client can connect to any instance of ETCD in cluster and perform read/write operation. If 2 writes come at the same time on 2 different ETCD instances then one is selected on the basis of leaders consent, write is complete when leader gets sconsent from other instanes in the instances
- Leader is elected using
raft
algorithm - voting election kind of mechanism - Write is considered successful if
quorum = N/2 + 1
has that write propogated, if cluster has instances less than quorum(majority nodes) then cluster will be down - It is recommended to have
odd
number of instances for better fault tolerance - For installation, we can download the latest binary from github.
ETCDCTL
utility can be used to access ETCD cluster
Steps to setup cluster using kubeadm
tool
- Have multiple hosts to designate one or more as master nodes - we can also use
vagrant
for provision virtual machines, this vagrantfile provisions one master and 2 worker nodes - Install container runtime like
docker
on each host(master & worker) - Install
kubeadm, kubelet and kubectl
on all hosts(master & worker) - Initialze master nodes - setting up all master node components
- Setup POD networking solution like
calico, weave net, etc.
on all nodes so all that all pods can communicate with each other - Join worker nodes to master nodes, command is printed on running
kubeadm init
to join master - run this command on each worker nodes - Launch applications - create pods
- Master nodes
- Check
kube-system
pods are up and running if we unable to perform managment operations like scaling pods up/down
- Check
- Worker nodes
- Check node status
- Describe nodes, if it's in Ready state
- Check kubelet certificates, if they are not expired
- Check kubelet status, if it is running
service kubelet status
- kubelet logs
sudo journalctl -u kubelet
- Network troubleshooting: https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/24452872#content
- When dealing with cluster with large number of nodes and objects, it becomes hard to query each node/objests and check for relevant information. So we can get print only relevant result, filter and sort on specific field using
jsonpath
option inkubectl
command
k get nodes -ojsonpath='{.items[*].metadata.name}' # Prints only node name
k get nodes -ojsonpath='{.items[*].status.capacity.cpu}' # Prints cpu
...
# Print node name and cpu info
k get nodes -ojsonpath='{.items[*].metadata.name}{"\n"}{.items[*].status.capacity.cpu}'
# We can format output using loops
k get nodes -ojsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.capacity.cpu}{"\n"}{end}'
# Using custom columns is another way to printing required information - as above
k get nodes -ocustom-columns=<COLUMN NAME>:<JSON PATH>
# Print node name and CPU
k get nodes -ocustom-columns=NODE:.metadata.name,CPU:.status.capacity.cpu
# We can also use sort-by option to sort according to some value (using json path)
k get nodes --sort-by=.metadata.name
# Filter based on specfic condition - get context name for user `aws-user`
kubectl config view --kubeconfig=/root/my-kube-config -ojsonpath='{.contexts[?(@.context.user=="aws-user")].name}'
# Example to delete a namespace
kubectl get namespace "ns1" -o json | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" | kubectl replace --raw /api/v1/namespaces/ns1/finalize -f -
- Last applied configuration is also kept with live yaml configuration. This helps k8s figure out if something is deleted then delete it from deployed version. Like a label is deleted from in new file applied, it will be checked if it was present in last applied config then will be deleted from deployed version.
- Last applied configuration is only stored when we use
kubectl apply
command, withkubectl create/replace
command this info is not stored. - So 3 things are compared when using
kubectl apply
command- New yaml file
- Deployed yaml version
- Last applied configuration
- Labels can be applied in k8s objects which can be used as selector for filtering required objects
- Like labels we can also have annotations which holds metadata info like buildversion, etc
- Deployment - You specify a PersistentVolumeClaim that is shared by all pod replicas. In other words, shared volume. The backing storage obviously must have ReadWriteMany or ReadOnlyMany accessMode if you have more than one replica pod.
- StatefulSet - You specify a volumeClaimTemplates so that each replica pod gets a unique PersistentVolumeClaim associated with it. In other words, no shared volume. Here, the backing storage can have ReadWriteOnce accessMode. StatefulSet is useful for running things in cluster e.g Hadoop cluster, MySQL cluster, where each node has its own storage.
- Read more here: https://stackoverflow.com/questions/41583672/kubernetes-deployments-vs-statefulsets
Note: We have used pod name as nginx
in all commands, this should be replaced with specific pod name. We have aliased kubectl command to k
alias k=kubectl
# Create a pod
k run nginx --image=nginx
# Create pod with label
k run nginx --image=nginx -l tier=msg
# Create pod and expose port
kubectl run httpd --image=httpd:alpine --port=80 --expose
k create deployment httpd-frontend --image=https:2.4-alpine
k create namespace dev # Create 'dev' namespace
# Doesn't create the object, only gives the yaml file
k run nginx --image=nginx --dry-run=client -o yaml > pod-definition.yaml
k create deployment nginx --image=nginx nginx --dry-run=client -o yaml > nginx-deployment.yaml
k create service clusterip redis --tcp=6379:6379 --dry-run=client -o yaml > service-definition.yaml
# Run a pod to debug or run some command like checking nslook from a pod for a service - we can use busybox image
# --rm will delete pod once command is completed or we exit from shell prompt
kubectl run --rm -it debug1 --image=<image> --restart=Never -- <command>
kubectl run --rm -it debug1 --image=busybox:1.28 --restart=Never -- sh # Attach with shell
k apply -f filename.yaml
k create -f filename.yaml
# Deploy this in given namespace. This ns info can also be added in yaml definition itself
# to avoid giving in command always, like when creating a pod, it can be added in metadata section
k create -f filename.yaml -n my-namespace
k get all # Get all k8s objects deployed
k get pods # Get list of all pods in current namespace like default
k get pods -n kube-system # Get list of all pods in 'kube-system' namespace
k get pods --all-namespaces # Get pods in all ns
k get pods -o wide # Gives more info like IP, node, etc.
k get pods nginx # Get specific pod info
k get pods --show-labels # Get labels column also
k get pods --no-headers # Don't print header
k get pods -selector app=App1 # Get pods having "app=App1" label
k get pods -l app=App1 # -l is same as -selector
# Pods running on a node
k get pods -A --field-selector spec.nodeName=<nodeName>
# Using jq - this general command can be used to filter any other parameter
k get pods -A --field-selector spec.nodeName=<nodeName> -o json | jq -r '.items[] | [.metadata.namespace, .metadata.name] | @tsv'
k get replicationcontrollers # Get list of replica controllers
k get replicaset
k get deployments
k get services
k get daemonsets
k get events
k describe pod
k describe pod nginx
k describe replicaset myapp-replicaset
k describe deployments
k describe services
k describe daemonsets <name>
k edit pod nginx # Opens this pods yaml file in editor and we can make the changes
k edit replicaset myapp-replicaset
k delete pod nginx
k delete replicaset myapp-replicaseet
k replace -f replicaseet-definition.yml # Update num of replicas and deploy yaml file
k scale --replicas=6 -f replicaseet-definition.yml
k scale --replicas=6 replicaset myapp-replicaset
k scale deployment -replicas=3 httpd-frontend
- Update image in a deployment (but take care, deployment file will have different image version - originally specified)
k set image deployment/myapp-deployment nginx=nginx:1.9.1
- See all options available for a resource
k explain <kind> # Format
k explain pod # See top level options
k explain pod --recursive # See all options
# See all tolerations options
k explain pod --recursive | grep -A5 tolerations
# Get node summary like free persistent volume(pv) space, which we can't find with other commands
kubectl get --raw /api/v1/nodes/ip-10-3-9-207.us-west-2.compute.internal/proxy/stats/summary
- Use dry-run option: https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/14937836#content
- Imperative Commands with Kubectl: https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/15018998#content
- CKA practice tests
- https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/16103293#overview
- K8s for absolute beginners: https://www.udemy.com/course/learn-kubernetes/
- kubeclt cheat sheet: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
- HTTPS: https://robertheaton.com/2014/03/27/how-does-https-actually-work/
- Installing k8s, the hard way: https://www.youtube.com/watch?v=uUupRagM7m0&list=PL2We04F3Y_41jYdadX55fdJplDvgNGENo
- Kodecloudhub CKA course: https://github.com/kodekloudhub/certified-kubernetes-administrator-course
- End to End tests(removed from CKA exam): https://www.youtube.com/watch?v=-ovJrIIED88&list=PL2We04F3Y_41jYdadX55fdJplDvgNGENo&index=19
- CKA with Practice Tests:
- CKA FAQs: https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/learn/lecture/15717196#overview
- Use the code -
DEVOPS15
- while registering for the CKA or CKAD exams at Linux Foundation to get a 15% discount
- Use the code -
- Good reads