Cluster | Members | CNI | Description |
---|---|---|---|
k8s | 1 etcd, 1 master, 2 worker | flannel | |
hk8s | 1 etcd, 1 master, 2 worker | calico | |
bk8s | 1 etcd, 1 master, 1 worker | flannel | |
wk8s | 1 etcd, 1 master, 2 worker | flannel | |
ek8s | 1 etcd, 1 master, 2 worker | flannel | |
ik8s | 1 etcd, 1 master, 1 base node | loopback | Missing worker node |
Protocol | Direction | Port Range | Purpose |
---|---|---|---|
TCP | Inbound | 6443* | Kubernetes API server |
TCP | Inbound | 2379-2380 | etcd server client API |
TCP | Inbound | 10250 | Kubelet API |
TCP | Inbound | 10251 | kube-scheduler |
TCP | Inbound | 10252 | kube-controller-manager |
TCP | Inbound | 10255 | Read-only Kubelet API |
Protocol | Direction | Port Range | Purpose |
---|---|---|---|
TCP | Inbound | 10250 | Kubelet API |
TCP | Inbound | 10255 | Read-only Kubelet API |
TCP | Inbound | 30000-32767 | NodePort Services |
Master node(s):
Location | Component | Comment |
---|---|---|
/var/log/kube-apiserver.log | API Server | Responsible for serving the API |
/var/log/kube-scheduler.log | Scheduler | Responsible for making scheduling decisions |
/var/log/kube-controller-manager.log | Controller | Manages replication controllers |
Worker node(s):
Location | Component | Comment |
---|---|---|
/var/log/kubelet.log | Kubelet | Responsible for running containers on the node |
/var/log/kube-proxy.log | Kube Proxy | Responsible for service load balancing |
In this part i'll try to setup a Kubernetes cluster using kubeadm
on a couple of instances in GCE (with an external etcd cluster). I'm using Ubuntu-based instances in my setups.
The cluster i will be creating will look like this:
- 1 etcd
- 1 master
- 2 worker
- flannel
Every node besides the etcd one will need the following components to be able to run and join in a Kubernetes cluster:
kubectl
kube-proxy
kubeadm
- Docker
To install the Kubernetes components needed you'll need to run the following (as root):
apt-get update && apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl
And for Docker you can run the following (as root):
curl -fsSL get.docker.com -o get-docker.sh
bash get-docker.sh
You'll get the latest CE version of Docker installed (18.x+)
TODO!
- Login to etcd-0
- Install cfssl with
apt-get install golang-cfssl
- Create CA certificate
- CA cert config
- CSR
- Run cfssl to create everything
{
mkdir -p /etc/kubernetes/pki/etcd
cd /etc/kubernetes/pki/etcd
cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "43800h"
},
"profiles": {
"server": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
},
"client": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"client auth"
]
},
"peer": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
EOF
cat > ca-csr.json <<EOF
{
"CN": "etcd",
"key": {
"algo": "rsa",
"size": 2048
}
}
EOF
cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
}
- Create client certificates
{
cat > client.json <<EOF
{
"CN": "client",
"key": {
"algo": "ecdsa",
"size": 256
}
}
EOF
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client
}
- Generate the server cert, remember we're doing a one node etcd cluster, we don't need the peer certificates:
export PEER_NAME=$(hostname -s)
export PRIVATE_IP=$(curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip)
cfssl print-defaults csr > config.json
sed -i '0,/CN/{s/example\.net/'"$PEER_NAME"'/}' config.json
sed -i 's/www\.example\.net/'"$PRIVATE_IP"'/' config.json
sed -i 's/example\.net/'"$PEER_NAME"'/' config.json
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server config.json | cfssljson -bare server
Note that we're using another profile to create the server certificate, the profile was created along with the CA certificate.
- Now create the systemd unit file needed, remember to pass the instance private IP address to the
--listen-client-urls
flag:
export PRIVATE_IP=$(curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip)
cat > /etc/systemd/system/etc.service <<EOF
[Unit]
Description=etcd
Documentation=https://github.com/coreos/etcd
[Service]
ExecStart=/usr/local/bin/etcd \
--name=etcd0 \
--data-dir=/var/lib/etcd \
--listen-client-urls=https://$PRIVATE_IP:2379,https://localhost:2379 \
--advertise-client-urls=https://localhost:2379 \
--cert-file=/etc/kubernetes/pki/etcd/server.pem \
--key-file=/etc/kubernetes/pki/etcd/server-key.pem \
--client-cert-auth=true \
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \
--initial-cluster-token=my-etcd-token \
--initial-cluster-state=new
Restart=on-failure
RestartSec=5
Type=notify
[Install]
WantedBy=multi-user.target
EOF
- Run the following to start the etcd service
{
systemctl daemon-reload
systemctl enable etcd
systemctl start etcd
}
- Now copy the following certificate files from
etcd-0
to themaster-0
instance:
ca.pem
, the Certificate Authority certificate, used for signing all the other certificates, everyone will trust this certificateclient.pem
client-key.pem
Place them in the followingmaster-0
directory:/etc/kubernetes/pki/etcd
. The client certificate and key will be used by the API server when connecting to etcd, this information will be passed tokubeadm
through a Master configuration manifest, see the next step.
- After
kubeadm
are done the master node will run the needed Kubernetes components (all but etcd) in Docker containers (or Pods) within Kubernetes.
- Create the Master configuration manifest file, my configuration file looks like this, yours will look somewhat different in regards to the IP addresses:
cat > master_config.yaml <<EOF
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
api:
advertiseAddress: 10.0.0.11
controlPlaneEndpoint: 10.0.0.11
etcd:
endpoints:
- https://10.0.0.10:2379
caFile: /etc/kubernetes/pki/etcd/ca.pem
certFile: /etc/kubernetes/pki/etcd/client.pem
keyFile: /etc/kubernetes/pki/etcd/client-key.pem
networking:
podCidr: 10.100.0.0/16
apiServerCertSANs:
- 10.0.0.11
EOF
Oh, and remember, you can add extra args and configuration to all of the components through this file.
- Apply the manifest but this first time with
--dry-run
flag to see what the h3ll is going on (there's alot happening in the background):
kubeadm init --config=master_config.yaml --dry-run
See the last section of this guide for more info on what kubeadm
does behind the scenes.
- Now apply the manifest without the
--dry-run
flag. If everything went fine you'll get a output which you'll use to join your nodes to the cluster. It looks similar to this:
kubeadm join 10.0.0.11:6443 --token <string> --discovery-token-ca-cert-hash sha256:<string> <string>
- Run the following to make sure you can run
kubectl
against the API server on the master:
sudo cp /etc/kubernetes/admin.conf $HOME/
sudo chown $(id -u):$(id -g) $HOME/admin.conf
export KUBECONFIG=$HOME/admin.conf
- Proceed with joining the nodes to the cluster.
- Make sure you have Docker and the following components installed on all worker nodes:
kubectl
kube-proxy
kubeadm
- Use the output from the
kubeadm init
command you ran on the master to join each node to the cluster
- On the master run
kubectl get nodes
, if none of the components are ready (in aNotReady
state). This is due to the fact that you don't have a Overlay network installed. For this inital cluster i'll be usingflannel
. On the master run:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
- Now when you run
kubectl get nodes
you'll see (after a while) that the state have now changed:
root@master-0:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-0 Ready master 40m v1.10.4
worker-0 Ready <none> 30m v1.10.4
worker-1 Ready <none> 3m v1.10.4
Notes:
- I got a warning about the version of Docker i had installed which was the latest CE (18+), recommended version are 17.03, i think i will revisit this for one reason or the other later on. I'll keep this note here for reference
- All manifest files will be written to the following directory on the Master node:
/etc/kubernetes/manifests/
- The namespace
kube-system
will be the home for all components kubeadm
creates all the certificates you'll need to secure your cluster and cluster components- The namespace
kube-system
will be the home for all componens kubeconfig
files are created and used by e.g. components
OK, so the following happened:
- The
kube-apiserver
Pod will be created:- The API Server services REST operations and provides the frontend to the cluster’s shared state through which all other components interact.
- Alot of configuration flags will be sent as
command
to the Pod. A couple of them are the ones we added to the Master configuration manifest regarding etcd - A
livenessProbe
are configured, probing the/healthz
path on port 6443 - Two
labels
are configured:component=kube-apiserver
andtier=control-plane
- A CPU request limit are configured
- Two read-only
volumeMounts
are configured, source directories are:/etc/ssl/certs
and/etc/kubernetes/pki
- The
kube-controller-manager
Pod will be created:- The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes.
- The
--controllers
flag are set to*,bootstrapsigner,tokencleaner
, which basically means that all controllers available are enabled - The
--leader-elect
flag are set totrue
which means that a leader election will be started, viable when running replicated components for high availability - A
livenessProbe
are configured, probing the/healthz
path on port 10252 (localhost) - Two
labels
are configured:component=kube-controller-manager
andtier=control-plane
- A CPU request limit are configured
- Besides mounting the same source directories as the
kube-apiserver
Pod this one also mounts/etc/kubernetes/controller-manager.conf
and/usr/libexec/kubernetes/kubelet-plugins/volume/exec
. The latter one are for adding plugins on-the-fly tokubelet
. hostNetwork
are set totrue
, this means that the controller manager Pod shares themaster-0
instance network stack
- The
kube-scheduler
Pod will be created:- The Kubernetes scheduler is a policy-rich, topology-aware, workload-specific function that significantly impacts availability, performance, and capacity. The scheduler needs to take into account individual and collective resource requirements, quality of service requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, deadlines, and so on.
- The only
volumeMount
are the one for mounting thescheduler.conf
file - Two
labels
are configured:component=kube-scheduler
andtier=control-plane
- A
livenessProbe
are configured, probing the/healthz
path on port 10251 (localhost) - A CPU request limit are configured
hostNetwork
are set totrue
- After these three components
kubeadm
will wait for thekubelet
to boot up the control plane asStatic Pods
.-
Quick note on Static Pods:
Static Pods are managed directly by
kubelet
daemon on a specific node, without the API server observing it. It does not have an associated replication controller, and kubelet daemon itself watches it and restarts it when it crashes. There is no health check. Static pods are always bound to one kubelet daemon and always run on the same node with it.If you're running Kubernetes clustered and Static Pods on each node you probably want to create a
DaemonSet
instead.
-
- Wait for the API server
/healthz
endpoint to returnok
- Store and create the configuration used in the
ConfigMap
calledkubeadm-config
(within thekube-system
namespace). This configuration is what we created in earlier in the Master configuration manifest and passed tokubeadm
, note that the version added as aConfigMap
holds all of the other default configuration (that we didn't touch). master-0
would be marked as master by adding alabel
and ataint
. The taint being basically that the master node should not have Pods scheduled on it.- A
secret
will be created with the bootstrap token. - A RBAC
ClusterRoleBinding
will be created referencing theClusterRole
:system:node-bootstrapper
and thesubjects
would be aGroup
with thename
:system:bootstrappers:kubeadm:default-node-token
. This will allow Node Bootstrap tokens to postCSR
in order for nodes to get long term certificate credentials. - Another RBAC
ClusterRoleBinding
will be created to allow thecsrapprover
controller automatically approveCSR
's from a Node Bootstrap Token. TheClusterRole
referenced are:system:certificates.k8s.io:certificatesigningrequests:nodeclient
. Subjects are aGroup
with thename
:system:nodes
- A
ConfigMap
calledcluster-info
will be created in thekube-public
namespace. This ConfigMap will hold the cluster info which will be the API server URL and also the CA data (public certificate?) - A RBAC
Role
will be created that allowsget
on thecluster-info
ConfigMap. - The RBAC
RoleBinding
that references theRole
created earlier.subjects
areUser
with the namesystem:anonymous
- A
ServiceAccount
are created with thename
:kube-dns
- A
Deployment
are created forkube-dns
- There's a total of three containers running in this Pod.
kube-dns
,dnsmasq
andsidecar
. selector
are set tomatchLabels
andk8s-app=kube-dns
- A
rollingUpdate
strategy are configured withmaxSurge=10%
andmaxUnavailable=0
. Which means that during a rolling update the deployment allows for 10% over committment of new Pods (or 110%) but with 0% unavailable. - The
spec
definesaffinity
(which would replacenodeSelector
in the end). In this case it's anodeAffinity
with a 'hard' affinity of the typerequiredDuringSchedulingIgnoredDuringExecution
and with the requirement that the architecture should beamd64
- There's a configured
livenessProbe
and areadinessProbe
. Liveness are for restarting during failure, readiness are for determing when the container are ready to accept traffic - Container ports configured are: 10053 TCP/UDP and 10055 for metrics
- Resource limits and requests are set for memory and CPU
- There's a total of three containers running in this Pod.
- A
Service
are created for handling DNS traffic- A clusterIP of 10.99.0.10 are added. Default you'll have a service network of 10.96.0.0/12, i changed this to 10.99.0.0/24 in my master_config.yaml manifest.
- A
ServiceAccount
forkube-proxy
are created - A
ConfigMap
containing the kube-proxy configuration - A
DaemonSet
for the kube-proxies are created. - Last but not least a
ClusterRoleBinding
are created for kube-proxy referencing the one of the system provided ClusterRolesnode-proxier
. The subject for this binding are the service account created earlier.
Role
vs. ClusterRole
:
Role
: Grant access to resources within a single namespaceClusterRole
: Grant access as theRole
but also cluster-scoped resources (nodes), non-resources endpoints and namespaced resources across all namespaces.
RoleBinding
and ClusterRoleBinding
:
- A role binding grants the permissions defined in a role to a user or set of users. It contains a list of subjects (users, groups or service accounts) and a reference to the role being granted.
- Users are represented by strings, no particular format are required.
- Prefix system: is reserved for Kubernetes system use.
- Service Accounts uses the following prefix system:serviceaccount:
- Don't mess with the system:node
ClusterRole
Default ClusterRoles
:
cluster-admin
, allow super-user access, any action on any resource.admin
, allows admin access, allows read/write access to most resources within a namespace.edit
, allows read/write access to most objects in a namespace.view
, allows read-only access to see most objects in a namespace.
Core Component Roles:
system:kube-scheduler
, allows access to the resources required by the scheduler component.system:kube-controller-manager
, allows access to the resources required by the controller-manager.system:node
, allows access to resources required by the kubelet component.system:node-proxier
, allows access to the resources required by the kube-proxy component
You can reference subresources in an RBAC role, e.g. pods/logs
.
As of version 1.9 ClusterRoles
can be created by combining other ClusterRoles
using a aggregationRule
. By adding a ClusterRole
with the following configuration:
aggregationRule:
clusterRoleSelectors:
- matchLabels:
rbac.example.com/aggregate-to-monitoring: "true"
and adding rbac.example.com/aggregate-to-monitoring: true
in another ClusterRole
the above one will be automatically filled with rules in the rules: []
section.
Example of rules
in Roles (rest is omitted):
rules:
- apiGroups: [""] <- Core API Group
resources: ["pods"]
verbs: ["get", "list", "watch"]
rules:
- apiGroups: [""] <- Allow reading Pods
resources: ["pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["batch", "extensions"] <- Allow reading/writing jobs
resources: ["jobs"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
Default RBAC grants no permissions to Service Accounts outside the kube-system
namespace. To grant Service Accounts permissions you try these out, ordered from most to least secure.
- Grant a role to an application-specific service account (best practice):
- Specify a
serviceAccountName
in the Pod spec - Create a service account with
kubectl create serviceaccount
Example on granting read-only permission withinmy-namespace
to themy-sa
service account:
kubectl create rolebinding my-sa-view \
--clusterrole=view \
--serviceaccount=my-namespace:my-sa \
--namespace=my-namespace
- Start API Server eith RBAC mode:
minikube start --extra-config=apiserver.Authorization.Mode=RBAC
kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
- Create new namespace:
kubectl create namespace prod
- Generate client certificate and key for the new user that we will use (run in a location where you can find:
openssl genrsa -out mike-admin.key 2048
openssl req -new -key mike-admin.key -out mike-admin.csr -subj "/CN=mikeadmin/O=robotnik.io"
openssl x509 -req -in mike-admin.csr -CA ~/.minikube/ca.crt -CAkey ~/.minikube/ca.key -CAcreateserial -out mike-admin.crt -days 30
- Create the credentials and context:
kubectl config set-credentials mikeadmin --client-certificate $PWD/mike-admin.crt --client-key $PWD/mike-admin.key
kubectl config set-context mike-admin --cluster=minikube --namespace=prod --user=mikeadmin
- Test to list pods in the newly created namespace (should fail):
kubectl --context=mike-admin get pods -n prod
Error from server (Forbidden): pods is forbidden: User "mikeadmin" cannot list pods in the namespace "prod"
- Create the Role:
cat <<EOF | kubectl create -f -
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
namespace: prod
name: mike-admin-role
rules:
- apiGroups: ["", "extensions"]
resources: ["deployments", "pods"]
verbs: ["get", "list", "watch", "create"]
EOF
- Create the RoleBinding:
cat <<EOF | kubectl create -f -
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: mike-admin-rolebinding
namespace: prod
subjects:
- kind: User
name: mikeadmin
apiGroup: ""
roleRef:
kind: Role
name: mike-admin-role
apiGroup: ""
EOF
- Run a Pod and list Pods
kubectl --context=mike-admin run nginx-prod --image nginx
kubectl --context=mike-admin get pods
- Test to list Pods in the default namespace:
kubectl --context=mike-admin get pods -n default
Error from server (Forbidden): pods is forbidden: User "mikeadmin" cannot list pods in the namespace "default"
Almost all Kubernetes objects and their Manifests looks the same, at least in the first few lines:
apiVersion: VERSION
kind: OBJECT_TYPE
metadata:
annotations:
labels:
name:
spec:
Pod manifest with a Liveness Probe (from Kubernetes Up & Running):
apiVersion: v1
kind: Pod
metadata:
name: kuard
spec:
containers:
- image: gcr.io/kuar-demo/kuard-amd64:1
name: kuard
livenessProbe:
httpGet:
path: /healthy
port: 8080
initialDelaySeconds: 5 <- Probe will not be used until 5 seconds after all the containers in the Pod are created
timeoutSeconds: 1 <- Probe must respond within 1s
periodSeconds: 10 <- Run every 10 seconds
failureThreshold: 3 <- Fail after >3 attemps
ports:
- containerPort: 8080
name: http
protocol: TCP
Create a Pod from a manifest through stdin
:
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
containers:
- name: busybox
image: busybox
args:
- sleep
- "10"
EOF
- Provides a tool to help configure time/date called
timedatectl
- Part of
systemd
- A centralized management solution for logging all kernel and userland processes
Command | Description |
---|---|
journalctl -u kubelet |
Look at logs for a specified systemd process |
journalctl -u kubelet -f |
Look at logs for a specified systemd process and follow the output |
journalctl -u kubelet -r |
Look at logs for a specified systemd process in reverse order, latest first |
journalctl -u kubelet --since "10 min ago" |
Look at the logs from the last 10 minutes |
timedatectl list-timezones |
List time zones |
timedatectl set-timezone Europe/Stockholm |
Set the timezone |