Skip to content

Instantly share code, notes, and snippets.

@patrick0057
patrick0057 / azure_node_create.sh
Last active December 26, 2018 19:43
Rancher 1.6 deploy Azure node from CLI
#!/bin/bash
#Blank values must be filled in unless specified otherwise by preceding comment.
#Name of the rancher environment you want to deploy your node to
export RANCHER_ENVIRONMENT='Default'
#Full path of your Rancher CLI executable
export RANCHERCLI_PATH='/usr/local/bin/rancher'
export MACHINE_DRIVER='azure'
#Name of your Node
export NAME=''
#AZURE_ENVIRONMENT can be: AzureChinaCloud, AzureGermanCloud, AzurePublicCloud or AzureUSGovernmentCloud
@patrick0057
patrick0057 / Change_Rancher_2.x_server_hostname.md
Last active May 12, 2019 16:31
Unsupported procedure to change Rancher server hostname and propagate changes to downstream clusters

Change Rancher 2.x server hostname

Credit for the information in this document goes to Superseb. I am just publishing it in an easy to follow gist for later use. Before starting on this guide ensure you have offline backups of etcd for your local Rancher cluster and all of your downstream clusters. Steps outlined in this document are unsupported, use at your own risk. I recommend performing steps in a test environment first.

  1. Navigate to Global> Settings> then find server-url in the list, click the triple dot and then "Edit'. Change the server-url to your desired value.
  2. Navigate to https://$server-url/v3/clusterregistrationtoken?clusterId=$CLUSTERID and grab the value from Data> insecureCommand>
    • Example value:

      curl --insecure -sfL https://$server-url/v3/import/2bdrqnkjzc7rbjsvg6j6dv9hgttmjgl84dw8tz775qkczq8qkkhh6t.yaml | kubectl apply -f -`
      
@patrick0057
patrick0057 / curl-etcd-metrics.md
Last active June 2, 2019 14:32
curl etcd metrics

Quick gist for curling etcd metrics. There are better ways to get the metrics but I'm creating this gist anyway in case I need to reference this again later.

export etcd_endpoint=$(docker exec etcd netstat -lpna | grep \:2379 | grep tcp | grep LISTEN | tr -s " " | cut -d" " -f4)

{ for var in $(docker inspect --format '{{ .Config.Env }}' etcd | sed 's/[][]//g'); do
if [[ "$var" == *"ETCDCTL_CERT"* ]] || [[ "$var" == *"ETCDCTL_KEY"* ]]; then
export ${var}
fi
done }
cmd/cloud-controller-manager/app/controllermanager.go: return c.ClientBuilder.ClientOrDie(serviceAccountName)
cmd/cloud-controller-manager/app/options/options.go: c.VersionedClient = rootClientBuilder.ClientOrDie("shared-informers")
cmd/kube-controller-manager/app/apps.go: ctx.ClientBuilder.ClientOrDie("daemon-set-controller"),
cmd/kube-controller-manager/app/apps.go: ctx.ClientBuilder.ClientOrDie("statefulset-controller"),
cmd/kube-controller-manager/app/apps.go: ctx.ClientBuilder.ClientOrDie("replicaset-controller"),
cmd/kube-controller-manager/app/apps.go: ctx.ClientBuilder.ClientOrDie("deployment-controller"),
cmd/kube-controller-manager/app/autoscaling.go: hpaClient := ctx.ClientBuilder.ClientOrDie("horizontal-pod-autoscaler")
cmd/kube-controller-manager/app/autoscaling.go: hpaClient := ctx.ClientBuilder.ClientOrDie("horizontal-pod-autoscaler")
cmd/kube-controller-manager/app/autoscaling.go: hpaClient := ctx.ClientBuilder.ClientOrDie
@patrick0057
patrick0057 / README.md
Last active June 10, 2022 12:33
Update self signed certificate on single install of Rancher 2.x

Update self signed certificate on single install of Rancher 2.x

  1. Download Rancher single tool on the server that is running your Rancher container:

    curl -LO https://github.com/patrick0057/rancher-single-tool/raw/master/rancher-single-tool.sh
  2. Run script so that it upgrades your installation (you can upgrade to the same version) and pass flags to indicate that you want to regenerate your self signed certificate. The most reliable way is to just specify all of your options on the command line but the script does have an easy to use automated system as well as shown in option b.

    a. Specify all flags on command line, including any rancher options you had and docker options. Option -s is required for generating new 10 year self signed SSL certificates.

@patrick0057
patrick0057 / README.md
Last active June 10, 2022 12:33
Deploy new cluster agent YAML

Deploy new cluster agent YAML

If you've recently made changes to your Rancher installation like updating the server URL or changing the Rancher installation SSL, then you will likely need to redeploy your cluster agent YAML files from the Rancher.

  1. Create a local admin user for use with this tutorial. Without this, the script cannot login to get the new deployment file. You cannot use user accounts that are tied to third party authentication such as LDAP, Active Directory or GitHub to name a few.
  2. Login to a single controlplane node of the cluster you need to redeploy your agent YAML to.
  3. Download the script:
    curl -LO https://github.com/patrick0057/cluster-agent-tool/raw/master/cluster-agent-tool.sh

wget https://github.com/patrick0057/cluster-agent-tool/raw/master/cluster-agent-tool.sh

@patrick0057
patrick0057 / README.md
Last active March 17, 2021 13:13
Change Rancher 2.x server-url

Change Rancher 2.x server-url

Single server installation

During this tutorial it is recommended to use the rancher-single-tool for Rancher single server installations. It isn't required but it makes the process much easier. As a result this guide will be based on using that tool.

  1. Download the rancher-single-tool to the node that is running your rancher server container.
       curl -LO https://github.com/patrick0057/rancher-single-tool/raw/master/rancher-single-tool.sh
       wget https://github.com/patrick0057/rancher-single-tool/raw/master/rancher-single-tool.sh
@patrick0057
patrick0057 / README.md
Last active September 25, 2020 13:08
Major disaster preparation and recovery

Major disaster preparation and recovery

In a perfect world our clusters would never experience a complete and total failure where data from all nodes is unrecoverable. Unfortunately this scenario is very possible and has happened before. In this article I will outline how to best prepare your environment for recovery in situations like this.

Situation: Employee A accidentally deletes all of the VM's for a production cluster after testing his latest script. How do you recover?

Option A: Keep VM snapshots of all of the nodes so that you can just restore them if they are deleted.

Option B: Manually bootstrap a new controlplane and etcd node to match one of the original nodes that were deleted.

In this article, I'm going to focus on Option B. In order to bootstrap a controlplane,etcd node, you will need an etcd snapshot, Kubernetes certificates and the runlike commands from the core Kubernetes components. If you prepare ahead of time for something like this, you can save a lot of time when it comes

@patrick0057
patrick0057 / README.md
Last active June 17, 2023 10:05
etcd performance testing and optimization

etcd performance testing and optimization

If your etcd logs start showing messages like the following, your storage might be too slow for etcd or the server might be doing too much for etcd to operate properly.

2019-08-11 23:27:04.344948 W | etcdserver: read-only range request "key:\"/registry/services/specs/default/kubernetes\" " with result "range_response_count:1 size:293" took too long (1.530802357s) to execute

If you storage is really slow you will even see it throwing alerts in your monitoring system. What can you do the verify the performance of your storage? If the storage is is not performing correctly, how can you fix it? After researching this I found an IBM article that went over this extensively. Their findings on how to test were very helpful. The biggest factor is your storage latency. If it is not well below 10ms in the 99th percentile, you will see warnings in the etcd logs. We can test this with a tool called fio which I will outline below.

Testing etcd per

@patrick0057
patrick0057 / README.md
Last active May 1, 2021 18:09
kube-apiserver restart loop

kube-apiserver restart loop

If the kube-apiserver is in a restart loop, it is possible that one of the etcd servers it is trying to connect to is no longer reachable. It should be able to just move on to the next etcd server but in some rare cases it does not. In those situations you need to remove the bad etcd servers from its startup options as shown below.

  1. Get runlike command for kube-apiserverwith the following command:

    docker run --rm -v /var/run/docker.sock:/var/run/docker.sock axeal/runlike kube-apiserver

    Example output: