Ephemeral Containers are available (as beta) from Kubernetes 1.23:
Pods are the fundamental building block of Kubernetes applications. Since Pods are intended to be disposable and replaceable, you cannot add a container to a Pod once it has been created. Instead, you usually delete and replace Pods in a controlled fashion using deployments.
Sometimes it's necessary to inspect the state of an existing Pod, however, for example to troubleshoot a hard-to-reproduce bug. In these cases you can run an ephemeral container in an existing Pod to inspect its state and run arbitrary commands.
Let's say we have a network configuration problem inside our cluster and we would like to examine if requests arrive into our example Pod.
For this purpose we could use tcpdump
, that can provide information about network packets. Let's see...
$ kubectl exec -n kube-system public-crc6sb7u3d0usppk6tnti0-alb1-58945c7f7d-46jrg -c nginx-ingress -- tcpdump -i any -nn
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "6daac4b0d0761a52e66e488ba6e32817f2d2122fd3ccec763dd31b3c913631e5": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "tcpdump": executable file not found in $PATH: unknown
Ehw, shoot. We don't have the binary installed. Let's install it.
$ kubectl exec -n kube-system public-crc6sb7u3d0usppk6tnti0-alb1-58945c7f7d-46jrg -c nginx-ingress -- apk --no-cache add tcpdump
ERROR: Unable to lock database: Permission denied
ERROR: Failed to open apk database: Permission denied
command terminated with exit code 99
Ewh, shoot again. We won't be able to install tcpdump
. What to do now? We could add a new container to the Pod. Since the container list in the Pod spec is immutable, we can't just kubectl edit
the Pod. However, we could grab the manifest of the Pod, remove the Pod, update the definition and recreate with the new manifest. Or, if the Pod was created as part of a Deployment/DaemonSet/StatefulSet/etc., we can kubectl edit
the "parent" manifest and the Pods will be recreated with the extra debugger sidecar.
But! First, that's just too much pain in the ass, who the hell wants to modify the manifests to debug a problem?! Second, the problem we're debugging can be sporadic and it might go away if we recreate the Pod, that's bad.
The Kubernetes community's solution for this problem is something called Ephemeral Containers. You can attach a debugger container to a Pod without modifying its manifest or restarting it. Sounds interesting, let's see how it works.
We can add an ephemeral container to a running Pod using the kubectl debug
command:
$ kubectl debug -n kube-system -it public-crc6sb7u3d0usppk6tnti0-alb1-58945c7f7d-46jrg --image=alpine --target=nginx-ingress
Targeting container "nginx-ingress". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-wrlk5.
If you don't see a command prompt, try pressing enter.
/ #
Awesome, let's install tcpdump
.
/ # apk add --no-cache tcpdump
fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/community/x86_64/APKINDEX.tar.gz
(1/2) Installing libpcap (1.10.1-r0)
(2/2) Installing tcpdump (4.99.1-r3)
Executing busybox-1.34.1-r3.trigger
OK: 7 MiB in 16 packages
/ #
Good, now we can dump.
/ # tcpdump -i any -nn
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
16:02:39.328428 eth0 In IP 172.21.0.1.443 > 172.30.52.136.45503: Flags [P.], seq 864428375:864428574, ack 2164684544, win 196, options [nop,nop,TS val 1462748474 ecr 2895761899], length 199
16:02:39.328439 eth0 Out IP 172.30.52.136.45503 > 172.21.0.1.443: Flags [.], ack 199, win 884, options [nop,nop,TS val 2895764137 ecr 1462748474], length 0
16:02:42.307858 eth0 In IP 10.38.86.46.41395 > 172.30.52.136.10254: Flags [S], seq 48264641, win 65535, options [mss 1440,sackOK,TS val 533221207 ecr 0,nop,wscale 9], length 0
16:02:42.307881 eth0 Out IP 172.30.52.136.10254 > 10.38.86.46.41395: Flags [S.], seq 4028028090, ack 48264642, win 65535, options [mss 1440,sackOK,TS val 2770486361 ecr 533221207,nop,wscale 9], length 0
[...]
^C
89 packets captured
99 packets received by filter
0 packets dropped by kernel
/ #
The kubectl debug
command modified our Pod's ephemeralContainers
field:
apiVersion: v1 apiVersion: v1
kind: Pod kind: Pod
metadata: metadata:
annotations: annotations:
cni.projectcalico.org/containerID: 927c3a627595580529c56a cni.projectcalico.org/containerID: 927c3a627595580529c56a
cni.projectcalico.org/podIP: 172.30.226.7/32 cni.projectcalico.org/podIP: 172.30.226.7/32
cni.projectcalico.org/podIPs: 172.30.226.7/32 cni.projectcalico.org/podIPs: 172.30.226.7/32
kubernetes.io/psp: ibm-privileged-psp kubernetes.io/psp: ibm-privileged-psp
prometheus.io/path: /metrics prometheus.io/path: /metrics
prometheus.io/port: "10254" prometheus.io/port: "10254"
prometheus.io/scrape: "true" prometheus.io/scrape: "true"
razee.io/build-url: https://travis.ibm.com/alchemy-contai razee.io/build-url: https://travis.ibm.com/alchemy-contai
razee.io/source-url: https://github.ibm.com/alchemy-conta razee.io/source-url: https://github.ibm.com/alchemy-conta
creationTimestamp: "2021-12-14T15:48:42Z" creationTimestamp: "2021-12-14T15:48:42Z"
generateName: public-crc6sb7u3d0usppk6tnti0-alb1-58945c7f7d generateName: public-crc6sb7u3d0usppk6tnti0-alb1-58945c7f7d
labels: labels:
alb-image-type: community alb-image-type: community
app: public-crc6sb7u3d0usppk6tnti0-alb1 app: public-crc6sb7u3d0usppk6tnti0-alb1
pod-template-hash: 58945c7f7d pod-template-hash: 58945c7f7d
name: public-crc6sb7u3d0usppk6tnti0-alb1-58945c7f7d-46jrg name: public-crc6sb7u3d0usppk6tnti0-alb1-58945c7f7d-46jrg
namespace: kube-system namespace: kube-system
ownerReferences: ownerReferences:
- apiVersion: apps/v1 - apiVersion: apps/v1
blockOwnerDeletion: true blockOwnerDeletion: true
controller: true controller: true
kind: ReplicaSet kind: ReplicaSet
name: public-crc6sb7u3d0usppk6tnti0-alb1-58945c7f7d name: public-crc6sb7u3d0usppk6tnti0-alb1-58945c7f7d
uid: c6967ef9-1b7a-40b0-af36-a100ad3bb34d uid: c6967ef9-1b7a-40b0-af36-a100ad3bb34d
resourceVersion: "2740" | resourceVersion: "4068"
uid: 36ace309-d6b3-4ef8-9917-e4f61bc4a3de uid: 36ace309-d6b3-4ef8-9917-e4f61bc4a3de
spec: spec:
affinity: affinity:
nodeAffinity: nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution: preferredDuringSchedulingIgnoredDuringExecution:
- preference: - preference:
matchExpressions: matchExpressions:
- key: dedicated - key: dedicated
operator: In operator: In
values: values:
- edge - edge
weight: 100 weight: 100
requiredDuringSchedulingIgnoredDuringExecution: requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms: nodeSelectorTerms:
- matchExpressions: - matchExpressions:
- key: dedicated - key: dedicated
operator: NotIn operator: NotIn
values: values:
- internal - internal
podAntiAffinity: podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution: requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector: - labelSelector:
matchExpressions: matchExpressions:
- key: app - key: app
operator: In operator: In
values: values:
- public-crc6sb7u3d0usppk6tnti0-alb1 - public-crc6sb7u3d0usppk6tnti0-alb1
topologyKey: kubernetes.io/hostname topologyKey: kubernetes.io/hostname
containers: containers:
- args: - args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --configmap=kube-system/ibm-k8s-controller-config - --configmap=kube-system/ibm-k8s-controller-config
- --annotations-prefix=nginx.ingress.kubernetes.io - --annotations-prefix=nginx.ingress.kubernetes.io
- --default-ssl-certificate=default/ephemeral-containers- - --default-ssl-certificate=default/ephemeral-containers-
- --ingress-class=public-iks-k8s-nginx - --ingress-class=public-iks-k8s-nginx
- --controller-class=cloud.ibm.com/public-iks-k8s-nginx - --controller-class=cloud.ibm.com/public-iks-k8s-nginx
- --election-id=ingress-controller-leader-public-iks-k8s- - --election-id=ingress-controller-leader-public-iks-k8s-
- --http-port=80 - --http-port=80
- --https-port=443 - --https-port=443
- --healthz-port=10254 - --healthz-port=10254
- --default-backend-service=kube-system/ibm-k8s-controlle - --default-backend-service=kube-system/ibm-k8s-controlle
- --publish-service=kube-system/public-crc6sb7u3d0usppk6t - --publish-service=kube-system/public-crc6sb7u3d0usppk6t
env: env:
- name: POD_NAME - name: POD_NAME
valueFrom: valueFrom:
fieldRef: fieldRef:
apiVersion: v1 apiVersion: v1
fieldPath: metadata.name fieldPath: metadata.name
- name: POD_NAMESPACE - name: POD_NAMESPACE
valueFrom: valueFrom:
fieldRef: fieldRef:
apiVersion: v1 apiVersion: v1
fieldPath: metadata.namespace fieldPath: metadata.namespace
- name: ARMADA_CLUSTER_ID - name: ARMADA_CLUSTER_ID
value: c6sb7u3d0usppk6tnti0 value: c6sb7u3d0usppk6tnti0
- name: ALB_ID - name: ALB_ID
value: public-crc6sb7u3d0usppk6tnti0-alb1 value: public-crc6sb7u3d0usppk6tnti0-alb1
- name: ALB_ID_LB - name: ALB_ID_LB
value: public-crc6sb7u3d0usppk6tnti0-alb1 value: public-crc6sb7u3d0usppk6tnti0-alb1
- name: SECURED_NAMESPACE - name: SECURED_NAMESPACE
value: ibm-cert-store value: ibm-cert-store
- name: INGRESS_IMAGE - name: INGRESS_IMAGE
value: registry.ng.bluemix.net/armada-master/ingress-co value: registry.ng.bluemix.net/armada-master/ingress-co
image: registry.ng.bluemix.net/armada-master/ingress-comm image: registry.ng.bluemix.net/armada-master/ingress-comm
imagePullPolicy: Always imagePullPolicy: Always
livenessProbe: livenessProbe:
failureThreshold: 3 failureThreshold: 3
httpGet: httpGet:
path: /healthz path: /healthz
port: 10254 port: 10254
scheme: HTTP scheme: HTTP
initialDelaySeconds: 300 initialDelaySeconds: 300
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
timeoutSeconds: 1 timeoutSeconds: 1
name: nginx-ingress name: nginx-ingress
ports: ports:
- containerPort: 80 - containerPort: 80
protocol: TCP protocol: TCP
- containerPort: 443 - containerPort: 443
protocol: TCP protocol: TCP
resources: resources:
requests: requests:
cpu: 10m cpu: 10m
memory: 100Mi memory: 100Mi
terminationMessagePath: /dev/termination-log terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File terminationMessagePolicy: File
volumeMounts: volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccoun - mountPath: /var/run/secrets/kubernetes.io/serviceaccoun
name: kube-api-access-dbbzj name: kube-api-access-dbbzj
readOnly: true readOnly: true
dnsPolicy: ClusterFirst dnsPolicy: ClusterFirst
enableServiceLinks: true enableServiceLinks: true
> ephemeralContainers:
> - image: alpine
> imagePullPolicy: Always
> name: debugger-vn22z
> resources: {}
> stdin: true
> targetContainerName: nginx-ingress
> terminationMessagePath: /dev/termination-log
> terminationMessagePolicy: File
> tty: true
initContainers: initContainers:
- command: - command:
- sh - sh
- -c - -c
- sysctl -e -w fs.file-max=6000000; sysctl -e -w fs.nr_o - sysctl -e -w fs.file-max=6000000; sysctl -e -w fs.nr_o
-e -w net.core.rmem_max=16777216; sysctl -e -w net.cor -e -w net.core.rmem_max=16777216; sysctl -e -w net.cor
-e -w net.core.rmem_default=12582912; sysctl -e -w net -e -w net.core.rmem_default=12582912; sysctl -e -w net
-e -w net.core.optmem_max=25165824; sysctl -e -w net.c -e -w net.core.optmem_max=25165824; sysctl -e -w net.c
-e -w net.core.somaxconn=32768; sysctl -e -w net.core. -e -w net.core.somaxconn=32768; sysctl -e -w net.core.
-e -w net.ipv4.ip_local_port_range="1025 65535"; sysct -e -w net.ipv4.ip_local_port_range="1025 65535"; sysct
262144 16777216"; sysctl -e -w net.ipv4.tcp_wmem="8192 262144 16777216"; sysctl -e -w net.ipv4.tcp_wmem="8192
-e -w net.ipv4.udp_rmem_min=16384; sysctl -e -w net.ip -e -w net.ipv4.udp_rmem_min=16384; sysctl -e -w net.ip
-e -w net.ipv4.ip_no_pmtu_disc=0; sysctl -e -w net.ipv -e -w net.ipv4.ip_no_pmtu_disc=0; sysctl -e -w net.ipv
-e -w net.ipv4.tcp_dsack=1; sysctl -e -w net.ipv4.tcp_ -e -w net.ipv4.tcp_dsack=1; sysctl -e -w net.ipv4.tcp_
net.ipv4.tcp_fack=1; sysctl -e -w net.ipv4.tcp_max_tw_ net.ipv4.tcp_fack=1; sysctl -e -w net.ipv4.tcp_max_tw_
-e -w net.ipv4.tcp_tw_recycle=0; sysctl -e -w net.ipv4 -e -w net.ipv4.tcp_tw_recycle=0; sysctl -e -w net.ipv4
-e -w net.ipv4.tcp_frto=0; sysctl -e -w net.ipv4.tcp_s -e -w net.ipv4.tcp_frto=0; sysctl -e -w net.ipv4.tcp_s
-e -w net.ipv4.tcp_max_syn_backlog=32768; sysctl -e -w -e -w net.ipv4.tcp_max_syn_backlog=32768; sysctl -e -w
-e -w net.ipv4.tcp_syn_retries=3; sysctl -e -w net.ipv -e -w net.ipv4.tcp_syn_retries=3; sysctl -e -w net.ipv
-e -w net.ipv4.tcp_retries2=5; sysctl -e -w net.ipv4.t -e -w net.ipv4.tcp_retries2=5; sysctl -e -w net.ipv4.t
-e -w net.ipv4.tcp_moderate_rcvbuf=1; sysctl -e -w net -e -w net.ipv4.tcp_moderate_rcvbuf=1; sysctl -e -w net
-e -w net.ipv4.tcp_keepalive_time=300; sysctl -e -w ne -e -w net.ipv4.tcp_keepalive_time=300; sysctl -e -w ne
-e -w net.ipv4.tcp_keepalive_probes=6; sysctl -e -w ne -e -w net.ipv4.tcp_keepalive_probes=6; sysctl -e -w ne
-e -w net.ipv4.tcp_window_scaling=1; sysctl -e -w net. -e -w net.ipv4.tcp_window_scaling=1; sysctl -e -w net.
-e -w net.ipv4.tcp_max_orphans=262144; sysctl -e -w ne -e -w net.ipv4.tcp_max_orphans=262144; sysctl -e -w ne
-e -w net.netfilter.nf_conntrack_max=9145728; sysctl - -e -w net.netfilter.nf_conntrack_max=9145728; sysctl -
-e -w net.netfilter.nf_conntrack_tcp_timeout_fin_wait=1 -e -w net.netfilter.nf_conntrack_tcp_timeout_fin_wait=1
-e -w net.netfilter.nf_conntrack_tcp_loose=1; sysctl - -e -w net.netfilter.nf_conntrack_tcp_loose=1; sysctl -
exit 0; exit 0;
image: registry.ng.bluemix.net/armada-master/ingress-alpi image: registry.ng.bluemix.net/armada-master/ingress-alpi
imagePullPolicy: Always imagePullPolicy: Always
name: sysctl name: sysctl
resources: {} resources: {}
securityContext: securityContext:
privileged: true privileged: true
terminationMessagePath: /dev/termination-log terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File terminationMessagePolicy: File
volumeMounts: volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccoun - mountPath: /var/run/secrets/kubernetes.io/serviceaccoun
name: kube-api-access-dbbzj name: kube-api-access-dbbzj
readOnly: true readOnly: true
nodeName: 10.38.86.2 nodeName: 10.38.86.2
nodeSelector: nodeSelector:
publicVLAN: "2838388" publicVLAN: "2838388"
preemptionPolicy: PreemptLowerPriority preemptionPolicy: PreemptLowerPriority
priority: 900000000 priority: 900000000
priorityClassName: ibm-app-cluster-critical priorityClassName: ibm-app-cluster-critical
restartPolicy: Always restartPolicy: Always
schedulerName: default-scheduler schedulerName: default-scheduler
securityContext: {} securityContext: {}
serviceAccount: ibm-k8s-ingress serviceAccount: ibm-k8s-ingress
serviceAccountName: ibm-k8s-ingress serviceAccountName: ibm-k8s-ingress
terminationGracePeriodSeconds: 30 terminationGracePeriodSeconds: 30
tolerations: tolerations:
- key: dedicated - key: dedicated
value: edge value: edge
- effect: NoExecute - effect: NoExecute
key: node.kubernetes.io/not-ready key: node.kubernetes.io/not-ready
operator: Exists operator: Exists
tolerationSeconds: 600 tolerationSeconds: 600
- effect: NoExecute - effect: NoExecute
key: node.kubernetes.io/unreachable key: node.kubernetes.io/unreachable
operator: Exists operator: Exists
tolerationSeconds: 600 tolerationSeconds: 600
volumes: volumes:
- name: kube-api-access-dbbzj - name: kube-api-access-dbbzj
projected: projected:
defaultMode: 420 defaultMode: 420
sources: sources:
- serviceAccountToken: - serviceAccountToken:
expirationSeconds: 3607 expirationSeconds: 3607
path: token path: token
- configMap: - configMap:
items: items:
- key: ca.crt - key: ca.crt
path: ca.crt path: ca.crt
name: kube-root-ca.crt name: kube-root-ca.crt
- downwardAPI: - downwardAPI:
items: items:
- fieldRef: - fieldRef:
apiVersion: v1 apiVersion: v1
fieldPath: metadata.namespace fieldPath: metadata.namespace
path: namespace path: namespace
status: status:
conditions: conditions:
- lastProbeTime: null - lastProbeTime: null
lastTransitionTime: "2021-12-14T15:48:45Z" lastTransitionTime: "2021-12-14T15:48:45Z"
status: "True" status: "True"
type: Initialized type: Initialized
- lastProbeTime: null - lastProbeTime: null
lastTransitionTime: "2021-12-14T15:48:53Z" lastTransitionTime: "2021-12-14T15:48:53Z"
status: "True" status: "True"
type: Ready type: Ready
- lastProbeTime: null - lastProbeTime: null
lastTransitionTime: "2021-12-14T15:48:53Z" lastTransitionTime: "2021-12-14T15:48:53Z"
status: "True" status: "True"
type: ContainersReady type: ContainersReady
- lastProbeTime: null - lastProbeTime: null
lastTransitionTime: "2021-12-14T15:48:42Z" lastTransitionTime: "2021-12-14T15:48:42Z"
status: "True" status: "True"
type: PodScheduled type: PodScheduled
containerStatuses: containerStatuses:
- containerID: containerd://642debc8590a95b7172f476e02f3e6c - containerID: containerd://642debc8590a95b7172f476e02f3e6c
image: registry.ng.bluemix.net/armada-master/ingress-comm image: registry.ng.bluemix.net/armada-master/ingress-comm
imageID: registry.ng.bluemix.net/armada-master/ingress-co imageID: registry.ng.bluemix.net/armada-master/ingress-co
lastState: {} lastState: {}
name: nginx-ingress name: nginx-ingress
ready: true ready: true
restartCount: 0 restartCount: 0
started: true started: true
state: state:
running: running:
startedAt: "2021-12-14T15:48:52Z" startedAt: "2021-12-14T15:48:52Z"
> ephemeralContainerStatuses:
> - containerID: containerd://cae6e4d02ef9784c64bd0f59b37d86a
> image: docker.io/library/alpine:latest
> imageID: docker.io/library/alpine@sha256:21a3deaa0d32a805
> lastState: {}
> name: debugger-vn22z
> ready: false
> restartCount: 0
> state:
> running:
> startedAt: "2021-12-14T16:13:30Z"
hostIP: 10.38.86.2 hostIP: 10.38.86.2
initContainerStatuses: initContainerStatuses:
- containerID: containerd://c1f9ec45ae1b7fe3b611169c25b62f6 - containerID: containerd://c1f9ec45ae1b7fe3b611169c25b62f6
image: registry.ng.bluemix.net/armada-master/ingress-alpi image: registry.ng.bluemix.net/armada-master/ingress-alpi
imageID: registry.ng.bluemix.net/armada-master/ingress-al imageID: registry.ng.bluemix.net/armada-master/ingress-al
lastState: {} lastState: {}
name: sysctl name: sysctl
ready: true ready: true
restartCount: 0 restartCount: 0
state: state:
terminated: terminated:
containerID: containerd://c1f9ec45ae1b7fe3b611169c25b containerID: containerd://c1f9ec45ae1b7fe3b611169c25b
exitCode: 0 exitCode: 0
finishedAt: "2021-12-14T15:48:44Z" finishedAt: "2021-12-14T15:48:44Z"
reason: Completed reason: Completed
startedAt: "2021-12-14T15:48:44Z" startedAt: "2021-12-14T15:48:44Z"
phase: Running phase: Running
podIP: 172.30.226.7 podIP: 172.30.226.7
podIPs: podIPs:
- ip: 172.30.226.7 - ip: 172.30.226.7
qosClass: Burstable qosClass: Burstable
startTime: "2021-12-14T15:48:42Z" startTime: "2021-12-14T15:48:42Z"