Rancher & Rancher Kubernetes Engine (RKE) - Brief Guide

A brief guide to

Rancher Kubernetes Engine (RKE)
Installing Rancher With Kubernetes
Deploying Kubernetes With Rancher
Managing Kubernetes With Rancher
Running Kubernetes Workloads

Tools

Support Tools
- Extended Rancher 2 Cleanup

Install RKE

Mac

brew install rke
rke --version

Server Requirements

DNS masquerade

If you are using a test server in Virtual Machine consider to setup a DNS masquerade tool to configure DNS forwarder like

dnsmasq on Mac Setup

brew install dnsmasq
brew info dnsmasq

# Create config directory
mkdir -pv $(brew --prefix)/etc/

# Setup *.test
echo 'address=/.test/<IP_ADDRESS>' >> $(brew --prefix)/etc/dnsmasq.conf

# Change port for High Sierra
echo 'port=53' >> $(brew --prefix)/etc/dnsmasq.conf

# Create resolver directory
sudo mkdir -v /etc/resolver

# Add your nameserver to resolvers
sudo bash -c 'echo "nameserver 127.0.0.1" > /etc/resolver/test'

# Verify that all .test requests are using 127.0.0.1
scutil --dns

# Modify macOS network configuration
sudo vi /etc/resolv.conf
nameserver 127.0.0.1 # Add localhost ip

# Verify your changes using the dig command by querying your local Dnsmasq instance.
dig rancher.mydomain.test
nslookup rancher.mydomain.test
dig -x <IP-ADDRESS> # Reverse IP lookup
dig rancher.mydomain.test @127.0.0.1

# Restart dnsmasq
sudo brew services stop dnsmasq
sudo brew services start dnsmasq

Add user in remote server to sudo group

# Install sudo
su -
apt-get update
apt-get install sudo

# Add user to sudo groups
sudo usermod -a -G sudo user

# Exit from remote server
exit

# Login to remote server and check if user is in sudo group
ssh user@remote_server
groups

Opening port TCP/6443

References: https://rancher.com/docs/rke/latest/en/os/#ports

Using iptables

# Open TCP/6443 for all
sudo iptables -A INPUT -p tcp --dport 6443 -j ACCEPT

# Open TCP/6443 for ONE SPECIFIC IP
sudo iptables -A INPUT -p tcp -s your_ip_here --dport 6443 -j ACCEPT

Using firewalld

# Open TCP/6443 for all
firewall-cmd --zone=public --add-port=6443/tcp --permanent
firewall-cmd --reload

# Open TCP/6443 for one specific IP
firewall-cmd --permanent --zone=public --add-rich-rule='
  rule family="ipv4"
  source address="your_ip_here/32"
  port protocol="tcp" port="6443" accept'
firewall-cmd --reload

Check configuration

# Check for open ports
sudo apt-get install nmap
nmap –sT –O localhost
....
PORT     STATE SERVICE
22/tcp   open  ssh
80/tcp   open  http
....

# Use netcat to check if port is open
sudo apt-get install netcat
sudo nc -vz localhost <PORT>

# Check for a specific port such as 6443 with lsof
sudo apt-get install lsof
sudo lsof -i:6443

# List of all LISTEN port
sudo lsof -i -P -n | grep LISTEN

# List all iptables rules
sudo iptables --list

Check for Opening Ports with firewalld advanced configuration

Installing Docker on remote server

References: https://rancher.com/docs/rancher/v2.x/en/installation/resources/installing-docker/

Check installer script version

ssh username@remote_server

# Check if certificates are installed on your server
ls /etc/ssl/certs

# If ssl/certificates folder is empty remove and re-install ca-certificates
suodo apt-get remove ca-certificates
sudo apt-get install ca-certificates
ls /etc/ssl/certs

# Installing Docker (Check installer script version before)
curl https://releases.rancher.com/install-docker/19.03.sh | sh
docker --version
sudo usermod -aG docker yourusername

Configuring SSH keys Client > Server

References:

Set AllowAgentForwarding yes in remote server

sudo vi /etc/ssh/sshd_config
...
# Uncomment line
AllowAgentForwarding yes
....

sudo systemctl restart sshd
sudo systemctl status sshd

Test connection Client > Server

# Login to remote server / node
ssh username@remote_server

# Create .ssh folder in remote server user home folder
mkdir $HOME/.ssh
chmod 700 $HOME/.ssh

# Login to your client
# Check if SSH key already exists
ls $HOME/.ssh/
less $HOME/.ssh/id_rsa

# If necessary create new SSH key pair
ssh-keygen

# Copy the SSH public key to the remote server / node
cat $HOME/.ssh/id_rsa.pub | ssh remote_server "tee -a /home/<yourusername>/.ssh/authorized_keys"

# Test SSH connectivity
ssh -i $HOME/.ssh/id_rsa username@remote_server docker version

Create a project folder

mkdir /opt/rke
cd /opt/rke

Creating cluster configuration file

Architecture Best Practices

rke config
...

Compile all asked questions

Change SSH User of host 'root' with your remote server username

HA Cluster Architecture Configuration

Reference: https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/production/#cluster-architecture

Nodes should have one of the following role configurations:
- etcd
- controlplane
- etcd and controlplane
- worker (the worker role should not be used or added on nodes with the etcd or controlplane role)
Have at least three nodes with the role etcd to survive losing one node. Increase this count for higher node fault toleration, and spread them across (availability) zones to provide even better fault tolerance.
Assign two or more nodes the controlplane role for master component high availability.
Assign two or more nodes the worker role for workload rescheduling upon node failure.

Deploying Kubernetes with RKE

rke up

Backup configuration files

Save a copy of the following files in a secure location:

cluster.yml: The RKE cluster configuration file.
kube_config_cluster.yml: The Kubeconfig file for the cluster, this file contains credentials for full access to the cluster.
cluster.rkestate: The Kubernetes Cluster State file, this file contains credentials for full access to the cluster.

Manage remote Kubernetes with Kubectl

kubectl configuration

References: https://rancher.com/docs/rancher/v2.x/en/cluster-admin/cluster-access/kubectl/

Workstation global configuration

cp kube_config_cluster.yml ~/.kube/config
chmod go-r ~/.kube/config

You can use any directory and specify it using the --kubeconfig flag, as in this command: kubectl --kubeconfig /custom/path/kube.config get pods From your workstation, launch kubectl. Use it to interact with your kubernetes cluster

cp kube_config_cluster.yml kube.config
chmod go-r kube.config

kubectl --kubeconfig kube.config get nodes

Get nodes state

kubectl get nodes
kubectl get nodes -o wide
kubectl describe nodes

Upgrade Kubernetes

Set kubernetes_version:"" with last available Kubernetes version (e.g. kubernetes_version: "v1.19.4-rancher1-1")

rke config --list-version --all
vi cluster.yml
...
rke up --config cluster.yml

Upgrade Kubernetes Downstream RKE Clusters

You can update Kubernetes version one node at the time

References: https://rancher.com/docs/rancher/v2.x/en/cluster-admin/upgrading-kubernetes/#recommended-best-practice-for-upgrades

Navigate to cluster you want edit (e.g. Global > Cluster_name)
Navigate to Node > Take a snapshot (option on the right > Snapshot now)
Edit the Cluster > Scroll down to Kubernetes options
From the Kubernetes Version drop-down, choose the version of Kubernetes that you want to use for the cluster.
Save new configuration and wait few minutes it will rebuilt
Check if version has changed

kubectl get nodes

If the upgrade fails, revert the cluster to the pre-upgrade Kubernetes version. This is achieved by selecting the Restore etcd and Kubernetes version option. This will return your cluster to the pre-upgrade kubernetes version before restoring the etcd snapshot.

Certificate Rotation

References: https://rancher.com/docs/rke/latest/en/cert-mgmt/#certificate-rotation

Adding and Removing Nodes

References: https://rancher.com/docs/rke/latest/en/managing-clusters/#adding-removing-nodes

You can edit cluster.yml file to add / remove nodes

vi cluster.yml

rke up

Install Rancher With Helm

Install Helm

References: https://helm.sh/

Mac

brew install helm

Add the Helm Chart Repository

Rancher provides three repositories for Helm charts:

Latest: Recommended for trying out the newest features. Not recommended for production.
Stable: Recommended for production environments
Alpha: Experimental previews of upcoming releases. Definitely not recommended for production.

helm repo add rancher-stable https://releases.rancher.com/server-charts/stable

helm repo update

Create a Namespace for Rancher

If namespace cattle-system doesn't exists create it

kubectl get namespaces
kubectl create namespace cattle-system

List all deployments

kubectl get deployments --all-namespaces

SSL Configuration

Rancher will always be protected by TLS, and you have three options for how to provision this component:

Rancher-generated self-signed certificates
Real certificates from Let's Encrypt
Certificates that you provide (real or self-signed)

The first two options require a Kubernetes package called cert-manager https://cert-manager.io/docs/

Install CertManager

Check jetstack/cert-manager latest stable release

# Kubernetes 1.16+
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager.yaml

# Kubernetes <1.16
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager-legacy.yaml

kubectl get deployments --namespace cert-manager
kubectl get pods -n cert-manager

kubectl rollout status deployment -n cert-manager cert-manager
kubectl rollout status deployment -n cert-manager cert-manager-cainjector
kubectl rollout status deployment -n cert-manager cert-manager-webhook

Install Rancher

References: https://rancher.com/docs/rancher/v2.x/en/installation/install-rancher-on-k8s/

Choosing a Rancher Version

helm search repo --versions

Rancher over SSL

Three available options

Option 1: Rancher-Generated Self-Signed Certificates
Option 2: Real Certificates From Let's Encrypt
Option 3: Certificates That You Provide

Option 1: Rancher-Generated Self-Signed Certificates

This is the default option when installing Rancher and requires no additional configuration. You only need to specify the namespace and the hostname when installing.

This installation option requires two parameters:

--set hostname=rancher.mydomain.com
--namespace cattle-system

helm install rancher rancher-stable/rancher \
  --version v2.5.5 \
  --namespace cattle-system \
  --set hostname=rancher.mydomain.com

Check if Rancher is Running

kubectl -n cattle-system get deploy rancher
kubectl -n cattle-system get pods
kubectl get all -n cattle-system

Wait for Rancher to be rolled out

kubectl rollout status deployment -n cattle-system rancher

If the state is not Running, run a describe on the pod and check the Events

kubectl -n cattle-system describe pod

Open https://rancher.mydomain.com on your browser

Option 2: Real Certificates From Let's Encrypt

To request a certificate from Let's Encrypt you must have the load balancer and hostname properly configured. Let's Encrypt will issue an http-01 challenge that cert-manager will process. Certificates issued by Let's Encrypt will be automatically renewed before their expiration date.

In addition to the parameters listed in Option 1, this installation option requires two additional parameters:

--set ingress.tls.source=letsEncrypt
--set [email protected]

Let's Encrypt uses the email to communicate with you about any issues with the certificate, such as its upcoming expiration. Please use a real email address for this parameter.

Choosing a Rancher Version

helm search repo --versions

helm install rancher rancher-stable/rancher \
  --version v2.5.5 \
  --namespace cattle-system \
  --set hostname=rancher.mydomain.com \
  --set ingress.tls.source=letsEncrypt \
  --set [email protected]

Check if Rancher is Running

kubectl -n cattle-system get deploy rancher
kubectl -n cattle-system get pods
kubectl get all -n cattle-system

Wait for Rancher to be rolled out

kubectl rollout status deployment -n cattle-system rancher

Open https://rancher.mydomain.com on your browser

Option 3: Certificates That You Provide

References:

If you have your own certificates, either from a public or private CA, you will load these into a Kubernetes Secret and tell Rancher and the Ingress Controller to use that secret to provision TLS.

In addition to the parameters listed in Option 1, this option requires the following additional parameter:

--set ingress.tls.source=secret

If you want a little more realism in your development self-signed certificates, you can use minica to generate your own local root certificate, and issue end-entity (aka leaf) certificates signed by it.

Choosing a Rancher Version

helm search repo --versions

helm install rancher rancher-stable/rancher \
  --version v2.5.5 \
  --namespace cattle-system \
  --set hostname=rancher.mydomain.com \
  --set ingress.tls.source=secret

If your certificates are signed by a private CA (or self-signed), you will also need to provide:

--set privateCA=true

helm install rancher rancher-stable/rancher \
  --version v2.5.5 \
  --namespace cattle-system \
  --set hostname=rancher.mydomain.com \
  --set ingress.tls.source=secret \
  --set privateCA=true

After initiating the install, you’ll need to create the secrets for the TLS certificates before the install will complete.

Create a file called tls.crt with the certificate
Create a file called tls.key with the private key
Create a secret called tls-rancher-ingress from those files.
This secret is of type tls

Minica example

mkdir certs
cd certs

# Create certificates
minica --ca-cert cacerts.pem -ca-key cacerts-key.pem -domains rancher.mydomain.com

# For Mac users - Trust certificate
sudo security add-trusted-cert -d -r trustAsRoot -k /Library/Keychains/System.keychain rancher.mydomain.com/cert.pem

# Create a secret called tls-rancher-ingress from those files
kubectl -n cattle-system create secret tls tls-rancher-ingress \
  --cert=rancher.mydomain.com/cert.pem --key=rancher.mydomain.com/key.pem

# Create a secret called tls-ca
kubectl -n cattle-system create secret generic tls-ca \
  --from-file=cacerts.pem=./cacerts.pem

Wait for Rancher to be rolled out

kubectl rollout status deployment -n cattle-system rancher

Check if Rancher is Running

kubectl -n cattle-system get deploy rancher
kubectl -n cattle-system get pods
kubectl get all -n cattle-system

Open https://rancher.mydomain.com on your browser

Troubleshooting

Error: deployment "rancher" exceeded its progress deadline

If you see the following error: error: deployment "rancher" exceeded its progress deadline, you can check the status of the deployment by running the following command:

kubectl -n cattle-system get deploy rancher

And get logs from the pod

Use kubectl and the pod name to list the logs from the pod

kubectl -n cattle-system get pods
kubectl -n cattle-system logs rancher-6868c8454-vrtvn -f

Check for a more exhaustive list of possible Troubleshooting

Delete namespace pods

List all deployments

kubectl get deployments --all-namespaces

Then to delete the deployment

Where NAMESPACE is the namespace it's in, and DEPLOYMENT is the name of the deployment

kubectl delete -n NAMESPACE deployment DEPLOYMENT

Cannot re-use a name that is still in use

helm ls --all-namespaces
kubectl delete namespace cattle-system
kubectl create namespace cattle-system

Backup of single node RKE Cluster

There are two methods for backing up an RKE cluster:

One-time Snapshots example

rke etcd snapshot-save --config cluster.yml --name snapshot-name
ssh username@remote_server 'sudo ls /opt/rke/etcd-snapshots'

Backup RKE Locally

Review the options you desire for the setup of the snapshot service
Configure the interval hours and retention settings, in the etcd service settings of your rke config file. Here’s a very simple single node cluster example

nodes:
- address: 1.2.3.4
 port: "22"
 role:
 - controlplane
 - worker
 - etcd
 user: root
 docker_socket: /var/run/docker.sock
 ssh_key_path: ~/.ssh/id_rsa
services:
 etcd:
 backup_config:
 interval_hours: 3
 retention: 3

Save the file run rke up. You’ll notice the etc-rolling-snapshots service gets modified

rke up

...
INFO[0011] [etcd] Successfully started [etcd-rolling-snapshots] container on host [104.43.162.0]

A snapshot will be saved in /opt/rke/snapshots on a node running etcd

ssh username@remote_server sudo ls /opt/rke/etcd-snapshots

Backup RKE to an S3 Bucket

[TODO]

Restoring RKE from Backup

Like Backup, restoring RKE, has a method for locally saved snapshots and a method for s3 based snapshots

Restoring RKE from Local Backup

The restoration routine will assume that the backup is in /opt/rke/etcd-snapshots
Modify the cluster.yml file for the new node

nodes:
    - address: 10.0.0.1
      hostname_override: node1
      user: ubuntu
      role:
        - controlplane
        - worker
#    - address: 10.0.0.2
#      hostname_override: node2
#      user: ubuntu
#      role:
#       - etcd

Run rke etcd snapshot-restore to restore etcd to its previous state

rke etcd snapshot-restore --config cluster.yml --name mysnapshot
rke up
kubectl get pod

Verify the cluster is operating as expected

kubectl get pods

Restoring RKE from an S3 Backup

[TODO]

Backup and Restore in HA RKE cluster for Rancher

To bakcup an RKE Cluster with more than one node

Prepare New Nodes

Prepare the same numbers of new nodes of existing Rancher server nodes for the new cluster. These can be the same size or larger than the existing Rancher server nodes.
Choose one of these nodes to be the initial "target node" for the restore. We will bring this node up first and then add the other two once the cluster is online.

Configure RKE

Make a backup copy of the RKE files you used to build the original cluster. Store these in a safe place until the new cluster is online.
Edit cluster.yml and make the following changes:
1. Remove or comment out the entire addons section. The information about the Rancher deployment is already in etcd.
2. Change the nodes section to point to the new nodes
3. Comment out all but the chosen target node.

nodes:
    - address: 10.0.0.1
      hostname_override: node1
      user: ubuntu
      role:
        - controlplane
        - worker
        - etcd
#    - address: 10.0.0.2
#      hostname_override: node2
#      user: ubuntu
#      role:
#       - controlplane
#       - worker
#       - etcd
#    - address: 10.0.0.3
#      hostname_override: node3
#      user: ubuntu
#      role:
#       - controlplane
#       - worker
#       - etcd

# addons:
#    apiVersion: v1
#    Kind: Pod
#    metadata:
#    ....

Restore the Database

How you perform the initial restore of the database depends on if the data is stored locally or available on S3.

Snapshot Stored Locally

Place the snapshot into /opt/rke/etcd-snapshots on the target node
Restore the snapshot with rke etcd snapshot-restore, passing it the name of the snapshot and pointing to the cluster.yml file

rke etcd snapshot-restore --config cluster.yml --name mysnapshot

Snapshot Stored on S3

Restore the snapshot with rke etcd snapshot-restore, passing it the parameters needed to access S3

Bring Up the Cluster

Bring up the cluster on the target node by running rke up, pointing to the cluster config file. When the cluster is ready, RKE will write a credentials file to the local directory. Configure kubectl to use this file and then check on the status of the cluster. Wait for the target node to change to Ready. The three old nodes will be in a NotReady state.

Complete the Transition to the New Cluster

When the target node is Ready, remove the old nodes with kubectl delete node.
Reboot the target node to ensure cluster networking and services are in a clean state

ssh username@server_1 sudo reboot

Wait until all pods in kube-system, ingress-nginx, and the rancher pod in cattle-system return to a Running state
1. The cattle-cluster-agent and cattle-node-agent pods will be in an Error or CrashLoopBackOff state until the Rancher server is up and DNS has been pointed to the new cluster.

kubectl get deployments -A
kubectl get daemonsets -A

Add the New Nodes

Edit cluster.yml and uncomment the additional nodes

nodes:
    - address: 10.0.0.1
      hostname_override: node1
      user: ubuntu
      role:
        - controlplane
        - worker
        - etcd
    - address: 10.0.0.2
      hostname_override: node2
      user: ubuntu
      role:
       - controlplane
       - worker
       - etcd
    - address: 10.0.0.3
      hostname_override: node3
      user: ubuntu
      role:
       - controlplane
       - worker
       - etcd

Run rke up to add the new nodes to the cluster

kubectl get nodes
rke up

Wait for all nodes to show Ready in the output of kubectl get nodes

kubectl get nodes

Reboot all target nodes to ensure cluster networking and services are in a clean state

ssh username@server_1 sudo reboot
ssh username@server_2 sudo reboot
ssh username@server_n sudo reboot

Reconfigure Inbound Access

Once the cluster is up and all three nodes are Ready, complete any final DNS or load balancer change necessary to point the external URL to the new cluster. This might be a DNS change to point to a new load balancer, or it might mean that you need to configure the existing load balancer to point to the new nodes.

After making this change the agents on the downstream clusters will automatically reconnect. Because of backoff timers on the clusters, they may take up to 15 minutes to reconnect.

Finishing Up

Securely store the new cluster.yml, kube_config_cluster.yml and cluster.rkestate files for future use.
Delete the archived configuration files from the old cluster.
Delete the nodes from the old cluster or clean them of all Kubernetes and Rancher configuration

Upgrade Rancher (RKE)

https://rancher.com/docs/rancher/v2.x/en/installation/install-rancher-on-k8s/upgrades/

Make a one-time snapshot of Rancher before continuing

rke etcd snapshot-save --config cluster.yml --name rke-etcd-snapshot-cert

ssh username@remote_server 'sudo ls /opt/rke/etcd-snapshots'

# Copy snapshot somewhere outside the cluster in case of failure
ssh -t remote_server 'sudo cat /opt/rke/etcd-snapshots/rke-etcd-snapshot-cert.zip' > ./rke-etcd-snapshot-cert.zip

Update the Helm repo

helm repo update

Get the values, which were passed with --set, from the current Rancher Helm chart that is installed

helm list --all-namespaces -f rancher -o yaml

helm get values -n cattle-system rancher -o yaml > values.yaml

Upgrading Rancher

helm upgrade rancher rancher-latest/rancher \
  --namespace cattle-system \
  -f values.yaml \
  --version=2.5.5

kubectl rollout status deployment -n cattle-system rancher

When upgrade is healty go to rancher.mydomain.com and new vesion is shown in the bottom left

Rolling Back a Failed Upgrade

If the upgrade fails, the process for rolling back is to restore from the snapshot taken just prior to the upgrade.

Cluster Locations

Hosted / Infrastructure Provider: Rancher can deploy clusters into a hosted provider's solution, such as Amazon’s Elastic Kubernetes Service (EKS), Microsoft’s Azure Kubernetes Service (AKS), Google’s Kubernetes Engine (GKE)
Custom Provider: If you do your provisioning with Terraform, Ansible, Puppet, Chef, Cloud Init, Shell scripts, autoscaling groups, or anything else, you can use the Custom driver to build those nodes into a Kubernetes cluster.
Imported Clusters: if you already have Kubernetes clusters running out there, or if you're using something like K3s, you can import those clusters into Rancher

Node Resource Requirements

It's recommended that control plane, data plane, and worker roles reside on different node pools. This allows you to configure and scale them without affecting other roles.

Networking and Port Requirements

Check logfiles and test that nodes in each role are able to communicate with nodes in other roles according to the charts in the documentation.

Also verify that cross-host networking is available from with Kubernetes by testing that Pods on one host are able to communicate with Pods on other hosts.

Cluster Members Roles

Cluster Owners have full control over everything in the cluster, including user access.
Cluster Members can view most cluster-level resources and create new projects.
Selecting Custom presents a list of roles that you can assign to the user.

References

Cluster Membership

RKE Configuration Options

References: https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/rke-clusters/options/

Create New Custom Cluster

Reference: https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/rke-clusters/custom-nodes/

Add cluster > From existing node
Give the Cluster Name
Check kubernetes Option section
1. You can choose Kubernetes Version
2. Network Provider
3. You can enable Windows Support (Network Provider > Fannel | Windows Support > Enabled)
4. Project Network Isolation
5. Cloud Provider
Check Private Registry section
Check Advance Option section
1. Nginx Ingress
2. Pod Security Policy Support
3. etcd Snapshot Backup Target (local / s3)
Check Authorized Endpoint
1. CA Certificate

RKE Templates

RKE Templates allow teams to standardize and simplify the creation of their Kubernetes clusters. With the proper permissions, an administrator or user can create and share RKE templates with other users. Templates, like applications have their own lifecycle. Administrators can even force user to use a template to create new clusters. Along with node templates and infrastructure-ascode tools like Terraform, organizations can go a long way to standardizing and systematizing the provisioning of all of the substrate serving their applications.

References: https://rancher.com/docs/rancher/v2.x/en/admin-settings/rke-templates/

Create an RKE Template

Navigate to Tools > RKE Templates
Click Add Template
Give the template a name
You can skip Share Template as you don’t have any users, yet
Review the other options in the template. Any that you adjust will be applied to clusters created with this template, unless users are allowed to override them. You can do this by toggling the override for that setting
1. For example, you may want users to be able to override the FQDN and CA certificate for a cluster they create with your template
You can require that a user fill in the settings when creating a cluster, by clicking the required argument for each parameter
Uncheck those settings for now, as we don’t want to require them when we use this template in a future lab.
Do allow the users to override the Kubernetes version, in a future lab we’ll create a cluster using this template, and we’ll use the cluster we create to demonstrate upgrades as well. Also, select an older version of Kubernetes.
Click Create

Testing That it Works

In the RKE Templates section of Rancher, you should see the RKE Template you just created. In a future lab we’ll use the template you created.

Create a Node Template

They make it easier to reuse an existing configuration instead of creating it again for every node or every cluster that you launch.

You can create node templates from the cluster launch screen or from your profile avatar in the top right.

Cloud credentials define how to communicate with a cloud provider. They allow you to configure multiple accounts for use with different providers or in different provider regions. When creating a node template, you select the cloud credentials to use.

Creating a Cloud Credential
Create a Node Template
1. Click on your User Avatar in the upper-right corner, and select Node Templates
2. Click Add Template
3. Choose your cloud of choice. You’ll be creating a new Cloud Credential, that allow Docker Machine to automate the provisioning of your infrastructure.
4. Setup the node template based upon your cloud of choice.
5. Give it a name and click Create
Create a New Cluster and set your newly saved node template on Template field

Cloud Providers

Check documentation to configure correctly a cloud provider

Setting Up Cloud Providers

Deploying a Cluster

Troubleshooting

General References:

Troubleshooting Index:

Rancher’s API Server
Container Runtime
Node Conditions
Kubelet on the Worker Nodes
etcd
Controlplane
nginx-proxy
Container Network Interface (CNI) and Networking

Troubleshooting: Rancher’s API Server

References: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/rancherha/

Check Rancher pods

kubectl -n cattle-system get pods -l app=rancher -o wide

# Pod container logs
kubectl -n cattle-system logs -l app=rancher

Troubleshooting: Container Runtime

ssh username@remote_server

sudo systemctl status docker.service | head -n 14

sudo journalctl -u docker.service | less

Troubleshooting: Node Conditions

Report on network availability, disk pressure, memory pressure, PID pressure, and a general "ready" state for the node.

References: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/kubernetes-resources/#get-node-conditions

kubectl describe node ip-172.31.42.12 | less

Run the command below to list nodes with Node Conditions that are active that could prevent normal operation

kubectl get nodes -o go-template='{{range .items}}{{$node := .}}{{range .status.conditions}}{{if ne .type "Ready"}}{{if eq .status "True"}}{{$node.metadata.name}}{{": "}}{{.type}}{{":"}}{{.status}}{{"\n"}}{{end}}{{else}}{{if ne .status "True"}}{{$node.metadata.name}}{{": "}}{{.type}}{{": "}}{{.status}}{{"\n"}}{{end}}{{end}}{{end}}{{end}}'

Example output:

worker-0: DiskPressure:True

Troubleshooting: Kubelet on the Worker Nodes

References: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/kubernetes-components/worker-and-generic/

ssh username@remote_server

# Check if the Containers are Running
docker ps -a -f=name='kubelet|kube-proxy'

# Check Container Logs
docker logs kubelet
docker logs kube-proxy

Troubleshooting: etcd

It stores the state for Kubernetes and the Rancher application.

References: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/kubernetes-components/etcd/

kubectl get nodes

ssh username@remote_server

# Checking if the etcd Container is Running
docker ps -a -f=name=etcd$

# etcd Container Logging
docker logs etcd

# Check etcd Members on all Nodes
docker exec etcd etcdctl member list

Troubleshooting: Controlplane

The control plane is where the cluster-wide API and logic engines run.

References: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/kubernetes-components/controlplane/

First need to determine which is the leader

kube-controller-manager

kubectl get endpoints -n kube-system kube-controller-manager \
  -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}{"\n"}'

ssh username@remote_server 'echo $HOSTNAME'

# Container Logging
ssh username@remote_server
docker logs kube-controller-manager

kube-scheduler

kubectl get endpoints -n kube-system kube-scheduler \
  -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}{"\n"}'

# Container Logging
ssh username@remote_server
docker logs kube-scheduler

kube-apiserver

If apiserver run in multiple active service replicas using tmux to allow viewing logs on the same time and check whick server logs contains errors

# tmux
ssh username@remote_server1
ssh username@remote_server2
ssh username@remote_server3

<prefix> :setw synchronize-panes

# Container Logging
docker logs --follow kube-apiserver

Troubleshooting: nginx-proxy

The nginx proxy exists so that non-controlplane nodes can reach the services in the controlplane without knowing which node they’re on or what IP they have.

References: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/kubernetes-components/nginx-proxy/

ssh username@remote_server

# Check if the Container is Running
docker ps -a -f=name=nginx-proxy

# Check Generated NGINX Configuration
# The generated configuration should include the IP addresses of the nodes with the controlplane role
docker exec nginx-proxy cat /etc/nginx/nginx.conf

# Container Logging
docker logs nginx-proxy

Troubleshooting: Container Network Interface (CNI) and Networking

If any firewalls or proxies are blocking those connections, the CNI will fail

References: https://rancher.com/docs/rancher/v2.x/en/troubleshooting/networking/

Make sure all of your ports are configured properly.
Next you can test the overlay network.
Create a Daemonset component using the busybox image. Save the following file as overlaytest.yml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: overlaytest
spec:
  selector:
      matchLabels:
        name: overlaytest
  template:
    metadata:
      labels:
        name: overlaytest
    spec:
      tolerations:
      - operator: Exists
      containers:
      - image: leodotcloud/swiss-army-knife
        imagePullPolicy: Always
        name: overlaytest
        command: ["sh", "-c", "tail -f /dev/null"]
        terminationMessagePath: /dev/termination-log

Launch it using kubectl create -f overlaytest.yml
Once the daemonset has rolled out, run this command. It will let each container ping the others. Wait until kubectl rollout status ds/overlaytest -w returns: daemon set "overlaytest" successfully rolled out.
Run the following script, from the same location. It will have each overlaytest container on every host ping each other:

#!/bin/bash
echo "=> Start network overlay test"
  kubectl get pods -l name=overlaytest -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeName}{"\n"}{end}' |
  while read spod shost
    do kubectl get pods -l name=overlaytest -o jsonpath='{range .items[*]}{@.status.podIP}{" "}{@.spec.nodeName}{"\n"}{end}' |
    while read tip thost
      do kubectl --request-timeout='10s' exec $spod -c overlaytest -- /bin/sh -c "ping -c2 $tip > /dev/null 2>&1"
        RC=$?
        if [ $RC -ne 0 ]
          then echo FAIL: $spod on $shost cannot reach pod IP $tip on $thost
          else echo $shost can reach $thost
        fi
    done
  done
echo "=> End network overlay test"

When this command has finished running, it will output the state of each route:

=> Start network overlay test
Error from server (NotFound): pods "wk2" not found
FAIL: overlaytest-5bglp on wk2 cannot reach pod IP 10.42.7.3 on wk2
Error from server (NotFound): pods "wk2" not found
FAIL: overlaytest-5bglp on wk2 cannot reach pod IP 10.42.0.5 on cp1
Error from server (NotFound): pods "wk2" not found
FAIL: overlaytest-5bglp on wk2 cannot reach pod IP 10.42.2.12 on wk1
command terminated with exit code 1
FAIL: overlaytest-v4qkl on cp1 cannot reach pod IP 10.42.7.3 on wk2
cp1 can reach cp1
cp1 can reach wk1
command terminated with exit code 1
FAIL: overlaytest-xpxwp on wk1 cannot reach pod IP 10.42.7.3 on wk2
wk1 can reach cp1
wk1 can reach wk1
=> End network overlay test

If you see error in the output, there is some issue with the route between the pods on the two hosts. In the above output the node wk2 has no connectivity over the overlay network. This could be because the required ports for overlay networking are not opened for wk2.

You can now clean up the DaemonSet by running kubectl delete ds/overlaytest
Some clouds, and CNI plugins have issues with MTU autodetection, make sure your MTU settings are correct as well.

Example

cat ./overlaytest.yml
kubectl create -f overlaytest.yml
kubectl rollout status ds/overlaytest -w

echo "=> Start network overlay test"; \
  kubectl get pods -l name=overlaytest \
  -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeName}{"\n"}{end}' \
    | while read spod shost; \
      do kubectl get pods -l name=overlaytest \
        -o jsonpath='{range .items[*]}{@.status.podIP}{" "}{@.spec.nodeName}{"\n"}{end}' \
          | while read tip thost; \
            do kubectl --request-timeout='10s' exec $spod \
              -c overlaytest -- /bin/sh -c "ping -c2 $tip > /dev/null 2>&1"; \
              RC=$?; if [ $RC -ne 0 ] then echo FAIL: $spod on $shost cannot reach pod IP $tip on $thost else echo $shost can reach $thost; fi; \
            done; \
        done; \
echo "=> End network overlay test"

Editing Cluster Options

References: https://rancher.com/docs/rancher/v2.x/en/cluster-admin/editing-clusters/

Navigate to cluster you want edit (e.g. Global > Cluster_name)
Navigate to Node option (Menu bar) > and click on Edit Cluster
Add Node Pole > add one or more (use Count option in the field) Worker or Primay (with only etc, Control Plane checked options) nodes
Save new configuration and wait few minutes it will rebuilt

if you create one or more Worker node and you want remove worker role in existing node, because no longer needed, you can:

# Get node NAME and check ROLES in each node
kubectl get nodes

# Drain the node
kubectl drain <NAME> --ignore-daemonsets=true

Navigate to cluster you are editing > Remove Worker role from the Primay nodes > Save new configuration and wait few minutes it will rebuilt

Upgrade Kubernetes

Check section Upgrade Kubernetes Downstream RKE Clusters

Managing Kubernetes With Rancher Using the CLI Tools

Kubectl

Each Kubernetes cluster within Rancher will have its own kubectl config file

References: https://rancher.com/docs/rancher/v2.x/en/cluster-admin/cluster-access/kubectl/

Accessing Clusters with kubectl Shell in the Rancher UI
1. From the Global view, open the cluster that you want to access with kubectl.
2. Click Launch kubectl. Use the window that opens to interact with your Kubernetes cluster.
Accessing Clusters with kubectl from Your Workstation
1. Log into Rancher. From the Global view, open the cluster that you want to access with kubectl.
2. Click Kubeconfig File.
3. Copy the contents displayed to your clipboard.
4. Paste the contents into a new file on your local computer. Move the file to ~/.kube/config. Note: The default location that kubectl uses for the kubeconfig file is ~/.kube/config, but you can use any directory and specify it using the --kubeconfig flag, as in this command: kubectl --kubeconfig /custom/path/kube.config get pods
```
pbpaste > kubeconfig
set -x KUBECONFIG (pwd)/kubeconfig
```
1. From your workstation, launch kubectl. Use it to interact with your kubernetes cluster.
```
kubectl get ns
ns <NAME>
kubectl get deploy
```

Rancher CLI

Everything that you can do with Kubernetes you can do with Rancher

References: https://rancher.com/docs/rancher/v2.x/en/cli/

Install Rancher and get barrer token

On the bottom right of your screen, there is the Download CLI option
Select the version for your operating system.
Install this and add it to your path. This will vary for your operating system.

# Mac
mv rancher /usr/local/bin
rancher --version

rancher --help
rancher app --help
rancher app ls --help

Now we’ll need an API Token. Click on your Avatar icon > select API & Keys > Select Add Key
Save these items to a file as this is the last time you will see them. Keep them safe, anyone who has access to this token has access to your credentials and RBAC for the scope of this API token.

# Replace <BEARER_TOKEN> and <SERVER_URL> with your information
rancher login https://<SERVER_URL> --token <BEARER_TOKEN>

Project Selection

# List of available projects displays
rancher context switch
...

Select a project: <NUMBER>

App installation throught Rancher CLI

rancher app install --set persistentVolume.enable=false --set [email protected] pgadmin4 pgadmin4

rancher app ls
watch -n 5 rancher app ls

rancher app show-notes pgadmin4

kcc do
...
Switched to context do
...

kubectl get pods --namespace pgadmin4-vzber \
  -l "app.kubernetes.io/name=pgadmin4,app.kubernetes.io/instance=pgadmin4" \
  -o jsonpath="{.items[0].metadata.name}"
...
pgadmin4-5c98347506c-zftzd
...

kubectl port-forward -n pgadmin4-vzber pgadmin4-5c98347506c-zftzd 8080:80

# Connect to localhost:8080 to PgAdmin4

SSH proxy to cluster node

# Get information about nodes
rancher nodes

# SSH login in node
rancher ssh <NAME>

Get namespaces with Rancher CLI

rancher namespace ls

Replace Kube controlor whit Rancher CLI kubectl can be available also without kubeconfig file after rancher login

rancher kubectl get nodes

Troubleshooting

FATA[0020] Get https://rancher.mydomain.com: dial tcp: lookup rancher.local.com on 10.0.0.1:53: read udp 10.0.0.2:60759->10.0.0.1:53: i/o timeout

# Check DNS config
ssh username@remote_server
cat /etc/resolv.conf

# Check coredns service
kubectl get svc -n kube-system

# Check code DNS pod endpoints
kubectl get ep -n kube-system

# Check CoreDNS pod logs
kubectl get po -n kube-system
kubectl logs -n kube-system pod/coredns-<NAME/NUMBER>

....
[INFO] plugin/reload: Running configuration MD5 = 7b85fbe9b4156cf64b768261d4ea6e3b
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[ERROR] plugin/errors: 2 2248327252308706239.6961694972959355173. HINFO: read udp 10.42.0.178:60278->10.0.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 2248327252308706239.6961694972959355173. HINFO: read udp 10.42.0.178:57472->10.0.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 2248327252308706239.6961694972959355173. HINFO: read udp 10.42.0.178:47228->10.0.0.1:53: i/o timeout
...

# Run busybox pod, then exec nslookup <ADDRESS> in busybox pod
kubectl run -it --rm --restart=Never busybox --image=busybox sh
cat /etc/resolv.conf
nslookup google.com

Enable Advanced Monitoring

Advanced monitoring deploys Prometheus and Grafana, and the worker nodes will need to have enough resources to accommodate the extra load. Recommended CPU and memory sizes are available in the Rancher documentation.

References: https://rancher.com/docs/rancher/v2.x/en/monitoring-alerting/v2.5/

Prometheus Configuration

In addition to the standard memory and CPU reservation and limits, consider the following options for your cluster

Data retention period - How long to keep data
Persistent Storage - Required for long-term retention
Node Exporter configuration - Enables host monitoring
Selectors and Tolerations - Controls workloads scheduling

Requirements

Make sure that you are allowing traffic on port 9796 for each of your nodes because Prometheus will scrape metrics from here.
The cluster should have at least 1950Mi memory available, 2700m CPU, and 50Gi storage. A breakdown of the resource limits and requests is here.

Enable Monitoring

References: https://rancher.com/docs/rancher/v2.x/en/monitoring-alerting/v2.5/#enable-monitoring

Navigate to cluster > Click 'Cluster' on top menu bar
Click on Enable Monitoring to see live metrics
Configure Monitoring. Example of long-term retention:
1. Data Retention = 72
2. Enable Persistent Storage for Prometheus = True
3. Prometheus Persistent Volume Size = 50Gi
4. Default StorageClass for Prometheus = ebs
5. Let's do the same for Grafana
  1. Grafana Persistent Volume Size = 3Gi
  2. Default StorageClass for Grafana = ebs
Navigate to System > Click Your Cluster Name on top menu bar > System
Check for cluster-monitoring pod deploy
When deployment is finished > Navigate to Resources > Workloads
Navigate to cluster > Check section 'Cluster Metrics' to see if Grafana is enabled
Navigate to Your Cluster Name > Your app (e.g. simple-app) in the cluster > Click Tools on top menu bar > Monitoring
1. Change monitoring options if necessary and enable it
Navigate to Resources > Workloads
When finished to build Navigate to Namespace: <your_app> and click on it
Check Workloads Metrics section to see Datail of Monitoring activity

Configure Notifiers

References: https://rancher.com/docs/rancher/v2.x/en/monitoring-alerting/v2.0.x-v2.4.x/notifiers/

Configure Alerts

References: https://rancher.com/docs/rancher/v2.x/en/monitoring-alerting/v2.0.x-v2.4.x/cluster-alerts/

Configure Logging

Rancher collects stdout and stderr output from each container, along with any logs written to /var/log/containers on the hosts and sends it to the configured endpoint.

Rancher can write logs to Elasticsearch, Splunk, Fluentd, Kafka, or syslog.

References: https://rancher.com/docs/rancher/v2.x/en/logging/v2.0.x-v2.4.x/cluster-logging/

Namespace

In Kubernetes, a namespace is a logical seperation of resources. It helps keep resources organized, and it allows resources of the same type with the same name to coexist on the same cluster.

Rancher uses Projects to group namespaces and apply a common configuration for RBAC to all of them.

The following resources can be assigned to a Project:

ConfigMaps
Secrets
Certificates
Registry Credentials

The following resources can only be assigned to a namespace within the project:

Workloads
Load Balancers / Ingress
Service Discovery records
Persistent Volume Claims

Rancher automatically creates two projects when you launch a cluster:

Default - User Workloads
System - System Workloads

Create a namespace via kubectl and move it in a specific cluster later via Rancher UI

kubectl create namespace ra-demo

Project Security

Resource Quotas

References: https://rancher.com/docs/rancher/v2.x/en/project-admin/resource-quotas/

Resource Limits

Many users forget to set these limits on their workloads, so having a reasonable default set on the project will help keep the cluster secure.

Be careful not to set a threshold that is too low, or else the cluster will regularly restart workloads that use more resources as part of their normal operation.

robbuh/rke-guide.md

Rancher & Rancher Kubernetes Engine (RKE) - Brief Guide

Tools

Install RKE

Mac

Server Requirements

DNS masquerade

Add user in remote server to sudo group

Opening port TCP/6443

Using iptables

Using firewalld

Check configuration

Installing Docker on remote server

Configuring SSH keys Client > Server

Test connection Client > Server

Create a project folder

Creating cluster configuration file

HA Cluster Architecture Configuration

Deploying Kubernetes with RKE

Backup configuration files

Manage remote Kubernetes with Kubectl

kubectl configuration

Get nodes state

Upgrade Kubernetes

Upgrade Kubernetes Downstream RKE Clusters

Certificate Rotation

Adding and Removing Nodes

Install Rancher With Helm

Install Helm

Mac

Add the Helm Chart Repository

Create a Namespace for Rancher

SSL Configuration

Install CertManager

Install Rancher

Choosing a Rancher Version

Rancher over SSL

Option 1: Rancher-Generated Self-Signed Certificates

Option 2: Real Certificates From Let's Encrypt

Choosing a Rancher Version

Option 3: Certificates That You Provide

Choosing a Rancher Version

Minica example

Troubleshooting

Error: deployment "rancher" exceeded its progress deadline

Use kubectl and the pod name to list the logs from the pod

Delete namespace pods

Cannot re-use a name that is still in use

Backup of single node RKE Cluster

One-time Snapshots example

Backup RKE Locally

Backup RKE to an S3 Bucket

Restoring RKE from Backup

Restoring RKE from Local Backup

Restoring RKE from an S3 Backup

Backup and Restore in HA RKE cluster for Rancher

Prepare New Nodes

Configure RKE

Restore the Database

Snapshot Stored Locally

Snapshot Stored on S3

Bring Up the Cluster

Complete the Transition to the New Cluster

Add the New Nodes

Reconfigure Inbound Access

Finishing Up

Upgrade Rancher (RKE)

Rolling Back a Failed Upgrade

Cluster Locations

Node Resource Requirements

Networking and Port Requirements

Cluster Members Roles

RKE Configuration Options

Create New Custom Cluster

RKE Templates

Create an RKE Template

Testing That it Works

Create a Node Template

Cloud Providers

Deploying a Cluster