Last active
November 9, 2018 22:34
-
-
Save Cryptophobia/447ecea7858d26141359b22161e70ee2 to your computer and use it in GitHub Desktop.
preStop Kubernetes Lifecycle Resque Hook
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# This script gracefully stops resque workers by issuing the USR2 | |
# signal to the resque-pool manager and then waits for the workers | |
# to be paused before exiting. Resque-pool master process relays | |
# the USR2 signal to the children. | |
# | |
# For reference: | |
# (1) https://github.com/nevans/resque-pool#signals | |
# (2) https://github.com/resque/resque/blob/master/lib/resque/worker.rb#L376-L378 | |
# | |
# This script can serve as a preStop lifecycle hook in Kubernetes | |
# | |
# Written by: Anton Ouzounov <[email protected]> | |
# | |
# Version 1.0 - June 14th, 2017 | |
# Version 1.1 - October 20, 2017 | |
kill -USR2 1; | |
#if resque workers present, wait 10 seconds and check again | |
numWorkers=$(ps aux | grep -i '[p]rocessing' | wc -l); | |
while true; do | |
if [ "$numWorkers" -eq "0" ]; then | |
#exit when all resque workers are terminated | |
echo "Time to terminate this pod"; | |
break; | |
else | |
echo "Not ready to exit, workers are processing, more sleepy sleepy."; | |
echo "$numWorkers -- number of workers processing."; | |
sleep 5; | |
kill -USR2 1; | |
numWorkers=$(ps aux | grep -i '[p]rocessing' | wc -l); | |
fi | |
done; | |
echo "Exiting from the bash script"; |
@sunild , for resque-pool master with workers in a k8s pod, you can specify pod.Spec.TerminationGracePeriodSeconds
and tune this value in the spec section of the Kubernetes pod's manifest if you need a grace period.
Here is the documentation: https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#soft-eviction-thresholds
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks!! I was looking a simple way to do this, found some much longer script that turned me off on the whole process. The "new" way of signal handling (using TERM_CHILD=1 in Resque > 1.22) was not acceptable to me ... it interrupts immediately and expects you to handle the exception and re-queue the job. They offer a grace period, but that is only to specify how long you're willing to wait for cleanup.