This approach attempts to rotate all certificates at once, then update all ETCD (Master) nodes at the same time. Unfortunately, Bosh can't update all Master nodes at the same time if they are deployed across AZs. So the procedure here is to reduce the number of Master nodes to one, and then expand again as we update all certificates across all VMs. This is faster but a bit riskier since we have the cluster with only one master node for a few minutes.
An alternative to this is to follow a more graceful approach to first roll out a new CA concatenated with the old CA and then regenerate leaf certificates for the ETCD servers. Then remove the old CA. This requires 3 passes (cluster updates) so it is slower but it is safer and allows Bosh to update Master nodes one at a time. This gist does not go into the details on how to do that.
Download credhub 2.2.1 CLI: Credhub2 CLI is requried to use the export option. Credhub 1.X does not have the export option yet. We need both CLIs
curl -JOL https://github.com/cloudfoundry-incubator/credhub-cli/releases/download/2.2.1/credhub-linux-2.2.1.tgz
tar -xzvf credhub-linux-2.2.1.tgz
mv credhub credhub2
Install ruby 2.3+
Install jq
The easiest thing for CFCR users is to leverage the credhub_login script provided in kubo-deployment repository. Run it like this:
${repo_base_directory}/kubo-deployment/bin/credhub_login ${kubo_env_path}
credhub2 export -p /${BOSH_ENVIRONMENT}/${kubo_env_name} > ${kubo_env_name}_certs_old.yaml
Example:
credhub2 export -p /cfcr/cfcr > cfcr_certs_old.yaml
Transform file to json
ruby -ryaml -rjson -e 'puts JSON.pretty_generate(YAML.load(ARGF))' < ${kubo_env_name}_certs_old.yaml > ${kubo_env_name}_certs_old.json
Example:
ruby -ryaml -rjson -e 'puts JSON.pretty_generate(YAML.load(ARGF))' < cfcr_certs_old.yaml > cfcr_certs_old.json
Inspect content to confirm export worked well. First CAs:
cat ${kubo_env_name}_certs_old.json | jq -r '.credentials | map(select(.type == "certificate")) | map(select(.value.ca == .value.certificate)) | .[].name'
Example:
cat cfcr_certs_old.json | jq -r '.credentials | map(select(.type == "certificate")) | map(select(.value.ca == .value.certificate)) | .[].name'
Expected output:
/cfcr/cfcr/etcd_ca
/cfcr/cfcr/kubernetes-dashboard-ca
/cfcr/cfcr/kubo_ca
Now certs:
cat ${kubo_env_name}_certs_old.json | jq -r '.credentials | map(select(.type == "certificate")) | map(select(.value.ca != .value.certificate)) | .[].name'
Example:
cat cfcr_certs_old.json | jq -r '.credentials | map(select(.type == "certificate")) | map(select(.value.ca != .value.certificate)) | .[].name'
Expected output: (This list may vary depending on the version of CFCR used)
/cfcr/cfcr/tls-etcdctl-flanneld
/cfcr/cfcr/tls-etcdctl-root
/cfcr/cfcr/tls-etcdctl-v0-29-0
/cfcr/cfcr/tls-etcd-v0-29-0
/cfcr/cfcr/tls-kubernetes-dashboard
/cfcr/cfcr/tls-influxdb
/cfcr/cfcr/tls-heapster
/cfcr/cfcr/tls-metrics-server
/cfcr/cfcr/tls-etcdctl
/cfcr/cfcr/tls-etcd-v0-17-0
/cfcr/cfcr/tls-kube-controller-manager
/cfcr/cfcr/tls-kubernetes
/cfcr/cfcr/tls-kubelet-client
/cfcr/cfcr/tls-kubelet
The easiest way is to create a bosh Ops file that changes the number of master instances to 1.
Below is an example that can be used. But remember to edit to change the master instance-group name if your cluster uses a differen one.
Also remember to update the statip_ips
to the ones used by your cluster or remove the second replace
if you don't use vip
network for master nodes.
Example of ops-single-master.yml
file:
- type: replace
path: /instance_groups/name=cfcr-master/instances
value: 1
- type: replace
path: /instance_groups/name=cfcr-master/networks/name=vip/static_ips
value: [100.71.29.64]
Update the cluster using ops-single-master.yml
:
bosh deploy -d cfcr "${bosh_env}/kubo-manifest.yml" -o "${bosh_env}/ops-single-master.yml"
This pass should be fast, deleting all Master nodes but 1, and updating all VMs with the updated internal links.
First the CAs:
cat ${kubo_env_name}_certs_old.json | jq -r '.credentials | map(select(.type == "certificate")) | map(select(.value.ca == .value.certificate)) | .[].name' | xargs -n 1 -t credhub regenerate -n &> ${kubo_env_name}_ca_cert_regen.out
Example:
cat cfcr_certs_old.json | jq -r '.credentials | map(select(.type == "certificate")) | map(select(.value.ca == .value.certificate)) | .[].name' | xargs -n 1 -t credhub regenerate -n &> cfcr_ca_cert_regen.out
Now the certs:
cat ${kubo_env_name}_certs_old.json | jq -r '.credentials | map(select(.type == "certificate")) | map(select(.value.ca != .value.certificate)) | .[].name' | xargs -n 1 -t credhub regenerate -n &> ${kubo_env_name}_cert_regen.out
Example:
cat cfcr_certs_old.json | jq -r '.credentials | map(select(.type == "certificate")) | map(select(.value.ca != .value.certificate)) | .[].name' | xargs -n 1 -t credhub regenerate -n &> cfcr_cert_regen.out
Redeploy cluster and Scale masters back to 3:
bosh deploy -d cfcr "${bosh_env}/kubo-manifest.yml" --skip-drain
If you are also rotating Bosh Certificates for the Agents, because they have expired and Agents are unresponsive
, then run this command instead. (This is a common scenario if the Director was created at the same time the CFCR cluster was created):
bosh deploy -d cfcr "${bosh_env}/kubo-manifest.yml" --recreate --fix --skip-drain