Skip to content

Instantly share code, notes, and snippets.

@askmeegs
Last active June 12, 2019 23:50
Show Gist options
  • Save askmeegs/e875874a4c5cef300477d6c7c4a793b6 to your computer and use it in GitHub Desktop.
Save askmeegs/e875874a4c5cef300477d6c7c4a793b6 to your computer and use it in GitHub Desktop.
Istio Add-on upgrade notes -- 1.0 --> 1.1

reported behavior: on GKE cluster upgrade to 1.13.6-gke.0, Istio is auto-upgraded to 1.1, and

  1. istio-policy and istio-telemetry (mixer) entered CrashLoopBackOff
  2. galley has TLS Handshake errors
  3. istio-gateway (ingress?) has TLS handshake errors

hypothesis: race condition between pilot and mixer (istio-policy) when all pods are down?


Part 1 - "vanilla upgrade"

From the default GKE version to 1.13.6-gke.0.

1. create GKE cluster with Istio add-on enabled

GKE version: 1.12.8-gke.6 (default as of 6/12/19)

Istio version: 1.0.6-gke.3

2. deploy hipstershop

kubectl label namespace default istio-injection=enabled

git clone https://github.com/GoogleCloudPlatform/microservices-demo 

kubectl apply -f release/

(Ensured that loadgen is running, can reach the frontend, all pods running and ready.)

3. upgrade the GKE cluster

  • Upgraded master node to GKE version: 1.13.6-gke.0
  • Istio auto-upgraded to: 1.1.3-gke.0

Istio Control plane behavior (notice both mixer pods had 1 restart, but are Runnning):

NAME                                      READY   STATUS      RESTARTS   AGE
istio-citadel-5c88f4f47d-qg2gm            1/1     Running     0          2m4s
istio-cleanup-secrets-1.1.3-5d22j         0/1     Completed   0          2m7s
istio-galley-6c78cdcfc4-s2ntp             1/1     Running     0          2m6s
istio-ingressgateway-7f6db9967c-8sqzv     1/1     Running     0          2m6s
istio-init-crd-10-t2rtl                   0/1     Completed   0          2m7s
istio-init-crd-11-hms72                   0/1     Completed   0          2m7s
istio-pilot-56df8c9cf6-psl7t              2/2     Running     0          2m4s
istio-policy-68775985b8-tn2fn             2/2     Running     1          2m5s
istio-security-post-install-1.1.3-bfc5m   0/1     Completed   0          2m7s
istio-sidecar-injector-bc79fc549-65gv8    1/1     Running     0          2m4s
istio-telemetry-5965f9d996-rpjhk          2/2     Running     1          2m5s
promsd-f876c8c55-f4v8b                    2/2     Running     1          2m1s

Hipstershop still healthy, after the master node upgrade:

NAME                                     READY   STATUS    RESTARTS   AGE
adservice-575b66f7cb-2wcv5               2/2     Running   0          15m
cartservice-db7fbdc79-fdvwj              2/2     Running   2          15m
checkoutservice-7788c5476b-z7jgj         2/2     Running   0          15m
currencyservice-7c5994b448-4nzsg         2/2     Running   0          15m
emailservice-699c694c4-p777x             2/2     Running   0          15m
frontend-7d78f46cc7-kkwzm                2/2     Running   0          15m
loadgenerator-56b9b9bd98-fj4dh           2/2     Running   0          15m
paymentservice-59749f9f74-gktzg          2/2     Running   0          15m
productcatalogservice-6779864fdb-k6hr2   2/2     Running   0          15m
recommendationservice-5bb4586744-mpvhz   2/2     Running   0          15m
redis-cart-d999c4589-9prjd               2/2     Running   0          15m
shippingservice-5b8d8d67f4-pkr2j         2/2     Running   0          15m

replicate exact setup

1 - deploy cluster / install Istio / issue cert / install Flagger

Went through each step of this guide / set up Istio ingressgateway with LetsEncrypt + my domain name:

https://docs.flagger.app/install/flagger-install-on-google-cloud

Only change: used K8S_VERSION=1.12.6-gke.11 for the initial cluster version, instead of default.

➜  ~ kubectl get nodes
NAME                                    STATUS   ROLES    AGE     VERSION
gke-istio3-default-pool-4b51166e-508m   Ready    <none>   5h16m   v1.12.6-gke.11
gke-istio3-default-pool-4b51166e-x8dv   Ready    <none>   95m     v1.12.6-gke.11

Istio version: 1.0.6-gke.3

istio-system namespace before upgrade:

NAME                                     READY   STATUS    RESTARTS   AGE
flagger-6c4f885df9-tlwwk                 1/1     Running   0          5m39s
flagger-grafana-7479855577-dgplt         1/1     Running   0          5m34s
istio-citadel-7cbd68dc58-gm96g           1/1     Running   0          5h16m
istio-egressgateway-76bd65699f-frjmz     1/1     Running   0          5h16m
istio-galley-848d6b698c-qm2p2            1/1     Running   0          5h16m
istio-ingressgateway-64d6cfc6cb-tw7cs    1/1     Running   0          6m2s
istio-pilot-57c9ff7496-44lwr             2/2     Running   0          5h16m
istio-policy-57bf7c4984-bflr7            2/2     Running   0          5h16m
istio-sidecar-injector-9db9b45f5-pbq2k   1/1     Running   0          5h16m
istio-telemetry-bf847c494-prwmv          2/2     Running   0          5h16m
prometheus-7c99657796-bpqqz              1/1     Running   0          5m49s
promsd-8cc5d455b-g9l94                   2/2     Running   1          5h15m

2- upgrade GKE cluster

  • Navigated to Google Cloud Console --> upgraded cluster's master node to 1.13.6.gke-6.

istio system namespace immediately after upgrade

NAME                                      READY   STATUS      RESTARTS   AGE
flagger-6c4f885df9-tlwwk                  1/1     Running     0          17m
flagger-grafana-7479855577-dgplt          1/1     Running     0          17m
istio-citadel-5c88f4f47d-zwc22            1/1     Running     0          89s
istio-cleanup-secrets-1.1.3-vdkw8         0/1     Completed   0          91s
istio-galley-6c78cdcfc4-8xp64             1/1     Running     0          89s
istio-ingressgateway-7f6db9967c-xw842     1/1     Running     0          89s
istio-init-crd-10-knt9d                   0/1     Completed   0          91s
istio-init-crd-11-kdclm                   0/1     Completed   0          91s
istio-pilot-5f886664b6-pm2tn              2/2     Running     0          90s
istio-policy-68775985b8-mtvv9             2/2     Running     2          90s
istio-security-post-install-1.1.3-7fggr   0/1     Completed   0          91s
istio-sidecar-injector-bc79fc549-xvm6l    1/1     Running     0          89s
istio-telemetry-5965f9d996-zxf96          2/2     Running     2          90s
prometheus-7c99657796-bpqqz               1/1     Running     0          17m
promsd-f876c8c55-s7x8b                    2/2     Running     1          89s

Istio version is now: 1.1.3-gke.0

Note about the 2 Mixer RESTARTS (for both istio-policy and istio-telemetry). Seems that the istio-policy pod failed its liveness probe the 1st time it started up, causing a restart:

kubectl describe pod -n istio-system istio-policy-68775985b8-mtvv9


Events:
  Type     Reason     Age                   From                                            Message
  ----     ------     ----                  ----                                            -------
  Normal   Scheduled  4m9s                  default-scheduler                               Successfully assigned istio-system/istio-policy-68775985b8-mtvv9 to gke-istio3-default-pool-4b51166e-x8dv
  Normal   Pulling    4m7s                  kubelet, gke-istio3-default-pool-4b51166e-x8dv  pulling image "gcr.io/gke-release/istio/mixer:1.1.3-gke.0"
  Normal   Pulled     3m56s                 kubelet, gke-istio3-default-pool-4b51166e-x8dv  Successfully pulled image "gcr.io/gke-release/istio/mixer:1.1.3-gke.0"
  Normal   Pulling    3m56s                 kubelet, gke-istio3-default-pool-4b51166e-x8dv  pulling image "gcr.io/gke-release/istio/proxyv2:1.1.3-gke.0"
  Normal   Pulled     3m22s                 kubelet, gke-istio3-default-pool-4b51166e-x8dv  Successfully pulled image "gcr.io/gke-release/istio/proxyv2:1.1.3-gke.0"
  Normal   Created    3m20s                 kubelet, gke-istio3-default-pool-4b51166e-x8dv  Created container
  Normal   Started    3m20s                 kubelet, gke-istio3-default-pool-4b51166e-x8dv  Started container
  Normal   Created    3m3s (x3 over 3m56s)  kubelet, gke-istio3-default-pool-4b51166e-x8dv  Created container
  Normal   Started    3m3s (x3 over 3m56s)  kubelet, gke-istio3-default-pool-4b51166e-x8dv  Started container
  Normal   Pulled     3m3s (x2 over 3m19s)  kubelet, gke-istio3-default-pool-4b51166e-x8dv  Container image "gcr.io/gke-release/istio/mixer:1.1.3-gke.0" already present on machine
  Warning  Unhealthy  3m3s (x3 over 3m13s)  kubelet, gke-istio3-default-pool-4b51166e-x8dv  Liveness probe failed: Get http://10.0.2.11:15014/version: dial tcp 10.0.2.11:15014: connect: connection refused
  Normal   Killing    3m3s                  kubelet, gke-istio3-default-pool-4b51166e-x8dv  Killing container with id docker://mixer:Container failed liveness probe.. Container will be killed and recreated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment