reported behavior: on GKE cluster upgrade to 1.13.6-gke.0, Istio is auto-upgraded to 1.1, and
- istio-policy and istio-telemetry (mixer) entered CrashLoopBackOff
- galley has TLS Handshake errors
- istio-gateway (ingress?) has TLS handshake errors
hypothesis: race condition between pilot and mixer (istio-policy) when all pods are down?
From the default GKE version to 1.13.6-gke.0.
GKE version: 1.12.8-gke.6
(default as of 6/12/19)
Istio version: 1.0.6-gke.3
kubectl label namespace default istio-injection=enabled
git clone https://github.com/GoogleCloudPlatform/microservices-demo
kubectl apply -f release/
(Ensured that loadgen is running, can reach the frontend, all pods running and ready.)
- Upgraded master node to GKE version:
1.13.6-gke.0
- Istio auto-upgraded to:
1.1.3-gke.0
Istio Control plane behavior (notice both mixer pods had 1 restart, but are Runnning):
NAME READY STATUS RESTARTS AGE
istio-citadel-5c88f4f47d-qg2gm 1/1 Running 0 2m4s
istio-cleanup-secrets-1.1.3-5d22j 0/1 Completed 0 2m7s
istio-galley-6c78cdcfc4-s2ntp 1/1 Running 0 2m6s
istio-ingressgateway-7f6db9967c-8sqzv 1/1 Running 0 2m6s
istio-init-crd-10-t2rtl 0/1 Completed 0 2m7s
istio-init-crd-11-hms72 0/1 Completed 0 2m7s
istio-pilot-56df8c9cf6-psl7t 2/2 Running 0 2m4s
istio-policy-68775985b8-tn2fn 2/2 Running 1 2m5s
istio-security-post-install-1.1.3-bfc5m 0/1 Completed 0 2m7s
istio-sidecar-injector-bc79fc549-65gv8 1/1 Running 0 2m4s
istio-telemetry-5965f9d996-rpjhk 2/2 Running 1 2m5s
promsd-f876c8c55-f4v8b 2/2 Running 1 2m1s
Hipstershop still healthy, after the master node upgrade:
NAME READY STATUS RESTARTS AGE
adservice-575b66f7cb-2wcv5 2/2 Running 0 15m
cartservice-db7fbdc79-fdvwj 2/2 Running 2 15m
checkoutservice-7788c5476b-z7jgj 2/2 Running 0 15m
currencyservice-7c5994b448-4nzsg 2/2 Running 0 15m
emailservice-699c694c4-p777x 2/2 Running 0 15m
frontend-7d78f46cc7-kkwzm 2/2 Running 0 15m
loadgenerator-56b9b9bd98-fj4dh 2/2 Running 0 15m
paymentservice-59749f9f74-gktzg 2/2 Running 0 15m
productcatalogservice-6779864fdb-k6hr2 2/2 Running 0 15m
recommendationservice-5bb4586744-mpvhz 2/2 Running 0 15m
redis-cart-d999c4589-9prjd 2/2 Running 0 15m
shippingservice-5b8d8d67f4-pkr2j 2/2 Running 0 15m
Went through each step of this guide / set up Istio ingressgateway with LetsEncrypt + my domain name:
https://docs.flagger.app/install/flagger-install-on-google-cloud
Only change: used K8S_VERSION=1.12.6-gke.11
for the initial cluster version, instead of default.
➜ ~ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-istio3-default-pool-4b51166e-508m Ready <none> 5h16m v1.12.6-gke.11
gke-istio3-default-pool-4b51166e-x8dv Ready <none> 95m v1.12.6-gke.11
Istio version: 1.0.6-gke.3
istio-system namespace before upgrade:
NAME READY STATUS RESTARTS AGE
flagger-6c4f885df9-tlwwk 1/1 Running 0 5m39s
flagger-grafana-7479855577-dgplt 1/1 Running 0 5m34s
istio-citadel-7cbd68dc58-gm96g 1/1 Running 0 5h16m
istio-egressgateway-76bd65699f-frjmz 1/1 Running 0 5h16m
istio-galley-848d6b698c-qm2p2 1/1 Running 0 5h16m
istio-ingressgateway-64d6cfc6cb-tw7cs 1/1 Running 0 6m2s
istio-pilot-57c9ff7496-44lwr 2/2 Running 0 5h16m
istio-policy-57bf7c4984-bflr7 2/2 Running 0 5h16m
istio-sidecar-injector-9db9b45f5-pbq2k 1/1 Running 0 5h16m
istio-telemetry-bf847c494-prwmv 2/2 Running 0 5h16m
prometheus-7c99657796-bpqqz 1/1 Running 0 5m49s
promsd-8cc5d455b-g9l94 2/2 Running 1 5h15m
- Navigated to Google Cloud Console --> upgraded cluster's master node to
1.13.6.gke-6
.
istio system namespace immediately after upgrade
NAME READY STATUS RESTARTS AGE
flagger-6c4f885df9-tlwwk 1/1 Running 0 17m
flagger-grafana-7479855577-dgplt 1/1 Running 0 17m
istio-citadel-5c88f4f47d-zwc22 1/1 Running 0 89s
istio-cleanup-secrets-1.1.3-vdkw8 0/1 Completed 0 91s
istio-galley-6c78cdcfc4-8xp64 1/1 Running 0 89s
istio-ingressgateway-7f6db9967c-xw842 1/1 Running 0 89s
istio-init-crd-10-knt9d 0/1 Completed 0 91s
istio-init-crd-11-kdclm 0/1 Completed 0 91s
istio-pilot-5f886664b6-pm2tn 2/2 Running 0 90s
istio-policy-68775985b8-mtvv9 2/2 Running 2 90s
istio-security-post-install-1.1.3-7fggr 0/1 Completed 0 91s
istio-sidecar-injector-bc79fc549-xvm6l 1/1 Running 0 89s
istio-telemetry-5965f9d996-zxf96 2/2 Running 2 90s
prometheus-7c99657796-bpqqz 1/1 Running 0 17m
promsd-f876c8c55-s7x8b 2/2 Running 1 89s
Istio version is now: 1.1.3-gke.0
Note about the 2 Mixer RESTARTS
(for both istio-policy and istio-telemetry). Seems that the istio-policy pod failed its liveness probe the 1st time it started up, causing a restart:
kubectl describe pod -n istio-system istio-policy-68775985b8-mtvv9
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m9s default-scheduler Successfully assigned istio-system/istio-policy-68775985b8-mtvv9 to gke-istio3-default-pool-4b51166e-x8dv
Normal Pulling 4m7s kubelet, gke-istio3-default-pool-4b51166e-x8dv pulling image "gcr.io/gke-release/istio/mixer:1.1.3-gke.0"
Normal Pulled 3m56s kubelet, gke-istio3-default-pool-4b51166e-x8dv Successfully pulled image "gcr.io/gke-release/istio/mixer:1.1.3-gke.0"
Normal Pulling 3m56s kubelet, gke-istio3-default-pool-4b51166e-x8dv pulling image "gcr.io/gke-release/istio/proxyv2:1.1.3-gke.0"
Normal Pulled 3m22s kubelet, gke-istio3-default-pool-4b51166e-x8dv Successfully pulled image "gcr.io/gke-release/istio/proxyv2:1.1.3-gke.0"
Normal Created 3m20s kubelet, gke-istio3-default-pool-4b51166e-x8dv Created container
Normal Started 3m20s kubelet, gke-istio3-default-pool-4b51166e-x8dv Started container
Normal Created 3m3s (x3 over 3m56s) kubelet, gke-istio3-default-pool-4b51166e-x8dv Created container
Normal Started 3m3s (x3 over 3m56s) kubelet, gke-istio3-default-pool-4b51166e-x8dv Started container
Normal Pulled 3m3s (x2 over 3m19s) kubelet, gke-istio3-default-pool-4b51166e-x8dv Container image "gcr.io/gke-release/istio/mixer:1.1.3-gke.0" already present on machine
Warning Unhealthy 3m3s (x3 over 3m13s) kubelet, gke-istio3-default-pool-4b51166e-x8dv Liveness probe failed: Get http://10.0.2.11:15014/version: dial tcp 10.0.2.11:15014: connect: connection refused
Normal Killing 3m3s kubelet, gke-istio3-default-pool-4b51166e-x8dv Killing container with id docker://mixer:Container failed liveness probe.. Container will be killed and recreated.