That's our RC:
$ cat ws-rc.yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: webserver-rc
spec:
replicas: 5
selector:
app: webserver
status: serving
template:
metadata:
labels:
app: webserver
env: prod
status: serving
spec:
containers:
- image: nginx:1.9.7
name: nginx
Now let's fire up some pods:
$ kubectl create -f ws-rc.yaml
replicationcontrollers/webserver-rc
All pods serving traffic, yes? Let's have a look:
$ kubectl get pods --selector="status=serving"
NAME READY STATUS RESTARTS AGE
webserver-rc-baeui 1/1 Running 0 18s
webserver-rc-dgijd 1/1 Running 0 18s
webserver-rc-ii79i 1/1 Running 0 18s
webserver-rc-lxag2 1/1 Running 0 18s <-- THIS ONE GONE BAD
webserver-rc-x5yvm 1/1 Running 0 18s
OIC one pod, webserver-rc-lxag2
, has gone bad. Let's isolate it:
$ kubectl label pods webserver-rc-lxag2 --overwrite status=troubleshooting
NAME READY STATUS RESTARTS AGE
webserver-rc-lxag2 1/1 Running 0 45s
And how many pods do we now have serving traffic (remember, I asked the RC for 5 replicas):
$ kubectl get pods --selector="status=serving"
NAME READY STATUS RESTARTS AGE
webserver-rc-baeui 1/1 Running 0 49s
webserver-rc-dgijd 1/1 Running 0 49s
webserver-rc-ii79i 1/1 Running 0 49s
webserver-rc-pwst1 0/1 Running 0 4s <-- BACKUP ALREADY UP AND RUNNING
webserver-rc-x5yvm 1/1 Running 0 49s
Sweet! Within 4s the RC has detected that I took the bad pod webserver-rc-lxag2
offline and has launched a backup somewhere. Now, how's my guinea pig doing?
$ kubectl get pods --selector="status!=serving"
NAME READY STATUS RESTARTS AGE
webserver-rc-lxag2 1/1 Running 0 1m <-- HERE'S MY GUINEA PIG
Here is the bad pod webserver-rc-lxag2
that I can now live-debug, without impacting the overall operation.