Skip to content

Instantly share code, notes, and snippets.

@cmbaughman
Created March 20, 2026 13:05
Show Gist options
  • Select an option

  • Save cmbaughman/6aa1afa84e19f3fe820439b17c10015c to your computer and use it in GitHub Desktop.

Select an option

Save cmbaughman/6aa1afa84e19f3fe820439b17c10015c to your computer and use it in GitHub Desktop.
Postmortem Template

Postmortem: [Short descriptive title]

Date: YYYY-MM-DD Duration: [Time from detection to resolution] Severity: [S1: service down | S2: degraded | S3: minor | S4: cosmetic] Tags: [e.g., disk, docker, dns, networking, backup, certs]

Impact

[1–3 sentences. What broke, from the user's perspective? Even if "the user" is just you. What could you not do while this was happening?]

Timeline

Time (UTC) Event
HH:MM First sign of trouble (or estimated start)
HH:MM Detection — how was it noticed?
HH:MM Key debugging steps (what you tried, in order)
HH:MM Root cause identified
HH:MM Fix applied
HH:MM Service confirmed restored

Root Cause

[2–5 sentences. The actual, technical reason this happened. Not "the server crashed" but WHY the server crashed.]

Contributing Factors

  • [Things that didn't directly cause the incident but made it worse or slower to detect/resolve. Missing monitoring, no runbook, stale docs, config drift, etc.]

Detection Gap

[How long was the service broken before you noticed? Why? What would have caught it sooner?]

Remediations

Action Type Owner Deadline Status
[Specific task] prevent / detect / mitigate you YYYY-MM-DD TODO
[Specific task] prevent / detect / mitigate you YYYY-MM-DD TODO

Lessons Learned

  • [What surprised you?]
  • [What went well during the response?]
  • [What would you do differently?]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment