Last active
August 20, 2024 15:43
-
-
Save mmerickel/a2159c51d7a2486b9ac7057fa6b69139 to your computer and use it in GitHub Desktop.
istio-launcher for implementing graceful shutdown and one-off job termination
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
apiVersion: apps/v1 | |
kind: Deployment | |
metadata: | |
name: foo | |
spec: | |
selector: | |
matchLabels: | |
app: myapp | |
template: | |
metadata: | |
annotations: | |
proxy.istio.io/config: | | |
# since we are always waiting for the app to shutdown first | |
# we have no need to drain - because the app is done by the | |
# time that this matters and it simply slows down the shutdown | |
# of the pod - the original default is 5s | |
terminationDrainDuration: 0s | |
# not strictly necessary but if you want to guarantee that | |
# the proxy is started before your app establishes connections | |
# then here ya go | |
holdApplicationUntilProxyStarts: true | |
spec: | |
terminationGracePeriodSeconds: 60 | |
containers: | |
- name: app | |
image: myapp/myapp | |
- name: istio-proxy | |
image: auto | |
lifecycle: | |
# see docs in istio-launcher.py explaining this protocol | |
preStop: | |
exec: | |
command: | |
- /bin/sh | |
- -c | |
- | | |
echo "[prestop] listening for shutdown" >> /proc/1/fd/2; | |
curl -X POST localhost:15000/drain_listeners?inboundonly 2>&1 > /dev/null; | |
nc localhost -l 15123 2>&1 > /dev/null; | |
echo "[prestop] success" >> /proc/1/fd/2; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM ... | |
COPY istio-launcher.py /app/istio-launcher.py | |
ENTRYPOINT ["tini", "--", "/app/istio-launcher.py"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
""" | |
Istio/K8S has issues with sidecars such that they do not properly kill | |
themselves when the primary workload is complete inside of a pod, thus | |
causing the entire pod to stay alive after the job should be complete | |
see https://github.com/istio/istio/issues/6324 | |
There are two scenarios we want to account for in this script: | |
1. When running as a one-off job, istio/k8s does not kill the istio-proxy | |
sidecar when the app is complete. From this script we can ping the | |
/quitquitquit endpoint to tell the sidecar to exit. | |
2. When K8S issues a graceful shutdown it sends a SIGTERM to both the | |
istio-proxy sidecar and the app. In our case the app handles graceful | |
shutdown correctly, so we want it to have the entire | |
terminationGracePeriodSeconds to try to shutdown. | |
The istio-proxy sidecar should do 2 things here: | |
- Stop incoming connections. | |
- Wait until the app is complete and exit when it is. | |
To do this, we use the preStop hook on the pod to block the SIGTERM. | |
The hook opens a listener, which we can hit when we are done. | |
You might ask why we don't just use the /quitquitquit endpoint all the | |
time and instead rely on this listener where we can. The problem is that | |
when we hit /quitquitquit while the preStop is blocking, the container | |
dies without it finishing and emits a warning level K8S event on graceful | |
shutdown. | |
""" | |
import os | |
import signal | |
import socket | |
import subprocess | |
import sys | |
ISTIO_CLEANUP = ''' | |
curl --silent --show-error -o /dev/null -X POST http://localhost:15020/quitquitquit | |
'''.split() | |
FWD_SIGS = { | |
signal.SIGTERM, | |
signal.SIGINT, | |
signal.SIGHUP, | |
signal.SIGUSR1, | |
signal.SIGUSR2, | |
} | |
def log(msg): | |
print(f'[istio-launcher] {msg}', file=sys.stderr) | |
dbg = log if os.getenv('ISTIO_LAUNCHER_VERBOSE') == '1' else lambda msg: None | |
# this queuedsigs is a slightly kludgy hack to capture any signals that arrive | |
# between the time we set up the signal handlers and the time the child process starts. | |
# A better approach would be to use signal.pthread_sigmask to temporarily block the | |
# signals while we get set up; the problem there is the child process inherits the | |
# blocked signal mask, and subprocess.Popen has no option to reset the signal mask | |
# in the child. So we'd have to roll our own implementation of Popen that resets the | |
# signal mask in the child process. | |
queuedsigs = set() | |
childpid = None | |
exitval = -120 | |
def sighandler(sig, frame): | |
if childpid is None: | |
queuedsigs.add(sig) | |
else: | |
dbg(f'forwarding signal={sig}') | |
os.kill(childpid, sig) | |
try: | |
for s in FWD_SIGS: | |
signal.signal(s, sighandler) | |
dbg(f'launching subprocess, argv={sys.argv[1:]}') | |
proc = subprocess.Popen(sys.argv[1:]) | |
childpid = proc.pid | |
# log(f'child pid={proc.pid}') | |
for s in queuedsigs: | |
dbg(f'forwarding queued signal={s}') | |
os.kill(childpid, s) | |
# wait for the program to finish | |
proc.wait() | |
exitval = proc.returncode | |
dbg(f'child finished, rc={exitval}') | |
for s in FWD_SIGS: | |
signal.signal(s, signal.SIG_DFL) | |
finally: | |
try: | |
# in graceful shutdown the sidecar has a listener on 15123 waiting for | |
# a connection - which will allow the sidecar to continue shutting down | |
# after we hit it notifying them that we are done with our work | |
dbg('sending graceful shutdown to istio-proxy') | |
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: | |
s.connect(('localhost', 15123)) | |
dbg('successfully sent graceful shutdown to istio-proxy') | |
except BaseException as ex: | |
dbg(f'graceful shutdown failed, err={ex}') | |
# explicitly do not worry if this was successful, we tried | |
p = subprocess.run(ISTIO_CLEANUP, check=False) | |
if p.returncode == 0: | |
dbg('successfully sent quitquitquit to istio-proxy') | |
else: | |
log(f'failed to send quitquitquit to istio-proxy, rc={p.returncode}') | |
sys.exit(exitval) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment