How to use:
- Edit the input variables at the top
- Copy paste steps one at a time into a bash/zsh terminal shell This will give better feedback and understanding of whats going on.
brew install kubectx
gcloud container clusters get-credentials cluster-1 --zone us-central1-c --project chrism-playground
gcloud container clusters get-credentials cluster-2 --zone us-central1-c --project chrism-playground
kubectx cluster1=gke_chrism-playground_us-central1-c_cluster-1
kubectx cluster2=gke_chrism-playground_us-central1-c_cluster-2
# ^-- renamed kubectl cluster names to shorter names
- Copy paste these in a text file
- Edit them to reflect your environment
- Copy paste the edited values into a termianl
# v-- GCP project
export PROJECT=chrism-playground
export DOMAIN=neoakris.dev
# ^-- domain name with a public internet TLD you have access to
export ORIGINAL_IP_NAME="original-global-ip"
export NEW_IP_NAME="new-global-ip"
(Copy paste)
echo $PROJECT
echo $DOMAIN
echo $ORIGINAL_IP_NAME
echo $NEW_IP_NAME
gcloud config set project $PROJECT
gcloud compute addresses create $ORIGINAL_IP_NAME --global --ip-version IPV4
export ORIGINAL_IP=$(gcloud compute addresses describe $ORIGINAL_IP_NAME --global | grep address: | cut -d ' ' -f 2)
echo $ORIGINAL_IP
Update public internet DNS so your domain name points to $ORIGINAL_IP
verify using command dig $DOMAIN
or website https://mxtoolbox.com/DnsLookup.aspx
Note even if the above report fine, it's possible for a router or host OS to have the old IP cached.
So you should also check at the host OS level.
ping neoakris.dev -c 1 | head -n 1
Make sure this command shows the updated IP
Mac and Linux users can usually run sudo killall -HUP mDNSResponder
to clear outdated DNS entries hashed at the host level.
mkdir -p ~/guide
cd ~/guide
# v-- switch kubectl config context to cluster1
kubectx cluster1
# v-- Note: this YAML object will work for both original & new cluster
tee managedcert.yaml << EOF
apiVersion: networking.gke.io/v1
kind: ManagedCertificate
metadata:
name: managed-cert
spec:
domains:
- $DOMAIN
EOF
alias k=kubectl
tee original-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: managed-cert-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: $ORIGINAL_IP_NAME
networking.gke.io/managed-certificates: managed-cert
kubernetes.io/ingress.class: "gce"
spec:
defaultBackend:
service:
name: test
port:
number: 80 #comes from k get svc test
EOF
k create deployment test --image=nginx
k expose deploy/test --port=80 --name=test
k apply -f managedcert.yaml
k apply -f original-ingress.yaml
Note it'll take 10-60 minutes for the managedcertificate to transition from STATUS Provisioning to STATUS Active
k get managedcertificate
# NAME AGE STATUS
# managed-cert 10m Provisioning
# Open a spare terminal
export DOMAIN=neoakris.dev
echo $DOMAIN
# v-- (copy paste following as a multi line command, CTRL+C will exit)
while :
do
kubectl get managedcertificate
curl --silent --fail https://$DOMAIN:443 -o '/dev/null' && echo "site is up" || echo "site is down"
sleep 1
done
# ^-- This will help you figure out when managed cert switches to status Active
k get managedcertificate
NAME AGE STATUS
managed-cert 17m Active
# ^-- Once I saw this I closed the spare terminal
For a pre-existing DNS entry, it's recommended to update the TTL to a low value like 300 (5 minutes) (Wait the full length of your old TTL before moving on, so if your old TTL was 1 day, you should wait for 1 day, so that the old TTL will expire and be replaced by the new TTL value.) If using CloudFlare DNS it's recommended to temporarily turn off CloudFlare Proxy for the DNS entry.
Phase 3 - Step 2: Use Let's Encrypt Free to Provision a HTTPS Cert using cloud agnostic generic methodology
- You'll use an interactive shell in a docker container to provision an HTTPS cert
# [admin@workstation:~/guide]
mkdir -p ~/guide/cert
cd ~/guide/cert
docker run -it --entrypoint=/bin/sh --volume $HOME/guide/cert:/.lego/certificates docker.io/goacme/lego:latest
# [shell@dockerized-ACME-client:/]
# (Note: /lego is intentional, lego alone will say lego not found in path)
/lego --email "[email protected]" --domains="neoakris.dev" --dns "manual" run
# Press Y to Accept the Terms of Service
# It'll say something along the lines of
# lego: Please create the following TXT record in your neoakris.dev. zone:
# _acme-challenge.neoakris.dev. 120 IN TXT "i892WmqXTiIg_FjG8myTTi2OnVxLhbMjoA9_wttttsE"
# So manually create a new TXT record with a TTL of 120 seconds with the given hostname & data
# IMPORTANT: Don't replace any pre-existing records with a TXT record.
# Example: DON'T convert an A record to a TXT record.
# DO Make a new record that is a TXT record.
# Press Enter after DNS has been updated according to the steps above.
# 2022/09/18 05:15:32 [INFO] [neoakris.dev] acme: Validations succeeded; requesting certificates
# 2022/09/18 05:15:33 [INFO] [neoakris.dev] Server responded with a certificate.
ls /.lego/certificates
# neoakris.dev.crt neoakris.dev.issuer.crt neoakris.dev.json neoakris.dev.key
exit
# [admin@workstation:~/guide/cert]
ls
# neoakris.dev.crt neoakris.dev.issuer.crt neoakris.dev.json neoakris.dev.key
# Note: You can remove the DNS TXT record now
# [admin@workstation:~/guide/cert]
echo $DOMAIN
#^-- make sure the value still looks right
export CRT=$(cat $DOMAIN.crt | base64)
export KEY=$(cat $DOMAIN.key | base64)
tee temporary-downtime-prevention-https-cert.yaml << EOF
apiVersion: v1
kind: Secret
metadata:
name: secret-tls
namespace: default
type: kubernetes.io/tls
data:
# the data is abbreviated in this example
tls.crt: $CRT
tls.key: $KEY
EOF
# [admin@workstation:~/guide/cert]
cd ~/guide
# [admin@workstation:~/guide]
# v-- switch kubectl config context to cluster2
kubectx cluster2
# v-- using apache2 docker image to make it easier to tell there's a difference
k create deployment test --image=httpd
k expose deploy/test --port=80 --name=test
gcloud compute addresses create $NEW_IP_NAME --global --ip-version IPV4
export NEW_IP=$(gcloud compute addresses describe $NEW_IP_NAME --global | grep address: | cut -d ' ' -f 2)
echo $NEW_IP
tee new-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: managed-cert-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: $NEW_IP_NAME
networking.gke.io/managed-certificates: managed-cert
kubernetes.io/ingress.class: "gce"
spec:
tls:
- hosts:
- $DOMAIN
secretName: secret-tls
defaultBackend:
service:
name: test
port:
number: 80 #comes from k get svc test
EOF
kubectl apply -f managedcert.yaml
kubectl apply -f $HOME/guide/cert/temporary-downtime-prevention-https-cert.yaml
kubectl apply -f new-ingress.yaml
- The new cluster currently has 1 manually imported HTTPS cert attached to the LB + 1 pending managed HTTPS cert stuck in provisioning until DNS cuts over
- After DNS cuts over there will be 2 HTTPS certs belonging to the same DNS name attached to the LB that's fine in the short term BUT an important note is that. Only the managed HTTPS cert will auto renew. The manually imported one won't auto renew. So we'll want to remove the manually imported HTTPS cert when we're done, in order to avoid the LB having an expired and unexpired cert at the same time, which would happen after 3 months pass.
- You can verify the 1 manually imported HTTPS cert is attached to the LB & working by doing the following test
#v-- This will show you the webpage of the original cluster
curl -v https://$DOMAIN:443 --resolve $DOMAIN:443:$ORIGINAL_IP
# Thank you for using nginx
#v-- This will show you the validity of the cert of the original cluster
echo QUIT | openssl s_client -connect $ORIGINAL_IP:443 -servername $DOMAIN -showcerts 2>/dev/null | openssl x509 -noout -text | grep Validity -A 2
# Validity
# Not Before: Sep 16 18:56:35 2022 GMT
# Not After : Dec 15 18:56:34 2022 GMT
IMPORTANT NOTE ABOUT THE NEXT STEP
(It might fail if you run it too soon) It should work after 10-60 mins, don't move on until it works.
Why 10-60 minutes? / What's going on?
There's a Kubernetes / GCP Controller Reconciliation loop that exists, It reads the Kubernetes secret attached to the Ingress object. After the reconciliation loop finishes the cert existing as a kube tls secret will be auto imported as an externally provisioned GCP cert and then attached to the LB. As shown in the example below
The url will look simliar to this:
https://console.cloud.google.com/kubernetes/ingress/us-central1-c/cluster-2/default/managed-cert-ingress/details?project=chrism-playground
#v-- This will show you the webpage of the new cluster (may need to wait 10-60 mins for reconcilation loop)
curl -v https://$DOMAIN:443 --resolve $DOMAIN:443:$NEW_IP
# It works!
# ^-- apache2 server's messaging
#v-- This will show you the validity of the cert of the new cluster (we can test before the dns cutover)
echo QUIT | openssl s_client -connect $NEW_IP:443 -servername $DOMAIN -showcerts 2>/dev/null | openssl x509 -noout -text | grep Validity -A 2
# Validity
# Not Before: Sep 17 22:10:46 2022 GMT
# Not After : Dec 16 22:10:45 2022 GMT
# ^-- The fact that the Certs Validity dates are different is sufficient proof
# that we're dealing with different certs & things are working as expected
Phase 4 - Step 3: Prepare to Cutover DNS by setting up test driven development / feedback loop to verify success
- Remember to wait for the original cluster's DNS TTL to be 5 min, if you had to update it from 3 days -> 5 min, you should wait 3 days. After 3 days have passed all DNS servers on the internet will re-check for DNS updates on a 5 min basis.
- Setup this feedback loop
- Open 2 spare terminals side by side or use something like tmux
export DOMAIN=neoakris.dev
# v-- && means only run the next cmd if the previous command was successful, || only run next cmd if previous failed
curl --silent --fail https://$DOMAIN:443 -o '/dev/null' && echo "site is up" || echo "site is down"
# site is up
# v-- let's prove that
curl --silent --fail i.dont.exist.com -o '/dev/null' && echo "site is up" || echo "site is down"
# site is down
# v-- wrap it in an infite loop that checks once a second (copy paste following as a multi line command, CTRL+C will exit)
while :
do
curl --silent --fail https://$DOMAIN:443 -o '/dev/null' && echo "site is up" || echo "site is down"
sleep 1
done
# ^-- This way you'll be able to verify zero downtime occured
export DOMAIN=neoakris.dev
echo $DOMAIN
# v-- (copy paste following as a multi line command, CTRL+C will exit)
while :
do
dig $DOMAIN | grep ANSWER -A 1 | grep $DOMAIN
sleep 1
done
- Cutover DNS from the old IP to the new IP
- spam the refresh button on your website, I noticed my browser reflected the change immediately in < 1 second. I had zero downtime, my loops said everything was up.
- Pay attention to the left and right terminals output while it happens and you'll be able to verify zero downtime occured
- and later my dig command showed the update meaning the site cutover for the rest of the internet as well.
- You can close the right terminal now, but wait until the very end before you close the left terminal.
Phase 4 - Step 5: Verify the managed cert is up for the new cluster, then remove the pre-provisioned HTTPS cert
kubectx cluster2
k get managedcertificate managed-cert
# STATUS = Provisioning
# v-- copy paste from while to done as a multi line command
while :
do
kubectl get managedcertificate managed-cert
sleep 1
done
# After about 10-60 minutes you'll see the managed-cert change from status provisioning to status Active
# Although the managed cert isn't ready yet
# The LB will be working perfect using the manually imported HTTPS cert which will result in zero downtime
- We want to get rid of the 2nd HTTPS cert that we manually imported to avoid problems when it expires after 3 months.
(If you don't do this last step you can run into an issue where your site toggles between a valid cert and an expired cert) - IMPORTANT!: wait until
kubectl get managedcertificate managed-cert
shows STATUS Active before moving on.
# [admin@workstation:~/guide]
cd ~/guide
cat new-ingress.yaml
# apiVersion: networking.k8s.io/v1
# kind: Ingress
# metadata:
# name: managed-cert-ingress
# annotations:
# kubernetes.io/ingress.global-static-ip-name: new-global-ip
# networking.gke.io/managed-certificates: managed-cert
# kubernetes.io/ingress.class: "gce"
# spec:
# tls:
# - hosts:
# - neoakris.dev
# secretName: secret-tls
# defaultBackend:
# service:
# name: test
# port:
# number: 80 #comes from k get svc test
tee new-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: managed-cert-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: $NEW_IP_NAME
networking.gke.io/managed-certificates: managed-cert
kubernetes.io/ingress.class: "gce"
spec:
defaultBackend:
service:
name: test
port:
number: 80 #comes from k get svc test
EOF
cat new-ingress.yaml
# Make sure all the environment variable substitutions look right (basically just got rid of the spec.tls reference)
# Again Don't run the following until after cluster2's managed cert has switched
# from Status Provisioning to Status Active (which can take 10-60 minutes after the DNS cutover has occured)
kubectx cluster2
k get managedcertificate
# NAME AGE STATUS
# managed-cert 68m Active <--took about 22 minutes after the DNS cutover to see this on cluster2
kubectl apply -f new-ingress.yaml
kubectl delete -f $HOME/guide/cert/temporary-downtime-prevention-https-cert.yaml
# ^--also a good idea to delete the temporary tls secret to avoid confusing people.
# Even though we updated the ingress & deleted the TLS secret, our temporary
# left terminal feedback loop still shows "site is up"