Created
February 18, 2019 15:35
-
-
Save okor/afbf0ed338a64e87174596cff192bd88 to your computer and use it in GitHub Desktop.
Script to clean up kubernetes pods in undesirable states, like after a kops rolling upgrade.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PODINITIALIZING_TIMEOUT_SECONDS = 60*60 # 1 hour | |
INIT_TIMEOUT_SECONDS = 10*60 # 10 minutes | |
def normalize_time(time_string) | |
unit = time_string.scan(/[[:alpha:]]+/).first # s,m,h,d | |
time = time_string.split(/[[:alpha:]]/).first.to_i # int | |
case unit | |
when 's' | |
time | |
when 'm' | |
time*60 | |
when 'h' | |
time*60*60 | |
when 'd' | |
time*60*60*24 | |
else | |
"Error: unknown time unit (#{unit})" | |
end | |
end | |
def delete_pod(pod, reason='') | |
command="kubectl delete pod #{pod[:name]} -n #{pod[:namespace]}" | |
puts "Deleting pod because #{reason}" | |
puts "> " + command | |
`#{command}` | |
end | |
def puts_pod(pod, message='') | |
pod = pod.clone | |
pod[:age] = pod[:meta][:original_age] | |
pod.delete(:meta) | |
puts "#{message} #{pod}" | |
end | |
pods_command = "kubectl get pods -o wide --sort-by={.spec.nodeName} --all-namespaces | grep -v 'Running\\|Completed' | awk '$1 != \"NAMESPACE\" {print $1,$2,$3,$4,$5,$6,$7,$8,$9}'" | |
pods = `#{pods_command}`.split(/\n/) | |
pods_data = [] | |
pods.each do |pod| | |
pod = pod.split(" ") | |
pod_hash = { | |
namespace: pod[0], | |
name: pod[1], | |
ready: pod[2], | |
status: pod[3], | |
restarts: pod[4], | |
age: normalize_time(pod[5]), | |
ip: pod[6], | |
node: pod[7], | |
nominated_node: pod[8], | |
meta: { original_age: pod[5] } | |
} | |
pods_data.push(pod_hash) | |
end | |
# Sort by status | |
pods_data = pods_data.sort_by { |k| k[:status] } | |
pods_data.each do |pod| | |
case pod[:status] | |
when 'Evicted' | |
delete_pod(pod, "it was previously Evicted, just cleaning up.") | |
when 'CrashLoopBackOff', 'RunContainerError', 'Error', 'Init:Error', 'Init:CrashLoopBackOff' | |
delete_pod(pod, "it's in a #{pod[:status]} state and that's wack.") | |
when 'PodInitializing' | |
if pod[:age] > PODINITIALIZING_TIMEOUT_SECONDS | |
delete_pod(pod, "it's in a #{pod[:status]} state and it's #{pod[:meta][:original_age]} old, so it's probably stuck.") | |
end | |
when /^(Init)/ | |
if pod[:age] > INIT_TIMEOUT_SECONDS | |
delete_pod(pod, "it's in a #{pod[:status]} state and it's #{pod[:meta][:original_age]} old, so it's probably stuck.") | |
end | |
when 'Terminating' | |
# boo, could try --now arg with delete | |
puts "Doing nothing. Pod is in Terminating state: #{pod}" | |
else | |
puts "Doing nothing. Pod is in an unhandled state: #{pod}" | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I've found that in my personal experience, when doing rolling upgrades to a kubernetes cluster, it's fairly common to find a bunch of pods in undesirable states. They are often "stuck" in those states.
Running this script will find all pods across all namespaces that are not "Running" or "Complete". Then it will decide whether those pods should be deleted - so that they can be replaced. A fresh pod is often sufficient to get a pod into a healthy state. There is no prompt or safety mechanisms here so run at your own risk.