Skip to content

Instantly share code, notes, and snippets.

@voxxit
Last active March 10, 2020 15:07
Show Gist options
  • Save voxxit/c147288f4e141551a88360159502f9ef to your computer and use it in GitHub Desktop.
Save voxxit/c147288f4e141551a88360159502f9ef to your computer and use it in GitHub Desktop.
Example/outline of a postmortem to be conducted after a site outage/incident

POSTMORTEM: “Event Title Here”

Issue Summary

This should be a short (4-5 sentences) blurb which succinctly describes the event. At the very least, it should include:

  • the duration (with start & end times in the U.S. Pacific time zone):

…which lasted for roughly 10 minutes between 9:02PM and 9:12PM Pacific…

  • the impact to our users:

…resulting in requests returning 5xx errors, at some points during the event peaking at 93% of requests…

  • the root cause of the incident:

After further investigation, our engineers have determined that a cascading failure in the datacenter’s battery backups caused instances to halt…

Timeline

  • Use human-readable timestamps—e.g.: Apr 29, 2016 9:02 AM Pacific

    • NOTE: Make sure to include the time zone — we have remote folks in other time zones who will be reading this…
  • Cover the entirety of the outage duration, including:

    • when outage began
    • when staff was notified
    • any actions, events, etc
    • when service was restored/outage ended

Root Cause

Give as detailed an explanation of the event as you can. Do NOT sugarcoat—postmortems are always to be conducted in a 100% blame-free environment, so don’t be afraid to state the facts.

Resolution and Recovery

Give as detailed an explanation of actions taken (including human-readable timestamps) by both internal and external/third parties.

Corrective and Preventative Measures

Create/maintain an itemized list of ways to prevent this outage from happening again. These items should ideally answer the question: “What can we do better next time?”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment