Created
April 13, 2016 14:10
-
-
Save tfheen/0f36612fddb2d24dbd8c133ca3b200af to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Debian healthcheck of multi-master upstream | |
Healthcheck: | |
Once per minute (or whatever), on each static.d.o node | |
- try to fetch .serial from peers, if failure: | |
- get .serial ten times from fastly | |
- save max value. | |
If max value > local value: | |
Mark local as unhealthy | |
purge service | |
Why not mini-nag? | |
- not per service | |
- does not check serial | |
- mini-nag's http-check could hook into this system | |
On push: | |
- purge changed/deleted files (rsync's --itemize-changes) | |
Outstanding problem: | |
- two bad hosts could end up with "good" health checks, risk | |
approximately 1.7E-5 (after host fails to hit peers) | |
- bootstrapping problem: what happens when all static hosts are down | |
and .serial has timed out? Treat 404/500 on peers + fastly .serial | |
as "we are current"? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment