Skip to content

Instantly share code, notes, and snippets.

@decagondev
Created June 24, 2026 18:14
Show Gist options
  • Select an option

  • Save decagondev/f2e14af4c25fe36d56406eb3514034fd to your computer and use it in GitHub Desktop.

Select an option

Save decagondev/f2e14af4c25fe36d56406eb3514034fd to your computer and use it in GitHub Desktop.

Hackathon Design Document — "Mid-Flight Engine Swap"

Fuses: CC5 (System Evolution) × CC2 (Incident & Resilience) × CC3 (Trustworthy Pipeline) Companion doc: cross-cutting-hackathons.md §10.3 Recommended slot: capstone — run last, once the brownfield harness and rubrics are proven.

Attribute Value
Codename Mid-Flight Engine Swap
Format Brownfield migration + modernization, with a surprise drill and an adversarial review
Duration 2 days
Team size 3–5
Difficulty ●●●●●
Native platform New brownfield harness + 32K-style defense + Gittery migration drills (warm-up)
Pass bar Total ≥ 70 and the rollback verifiably restores state and no data is lost

1. Premise

A live-ish application needs its database reshaped and one of its services modernised — and it cannot go down to do it. You will design and apply the migration with a tested rollback path, seed realistic fixture data so the integration tests stay honest, survive a failure injected during the migration window, and ship the modernised service only through a pipeline that can be trusted. Then you will defend the whole thing to a reviewer whose job is to find the hole in your plan.

This is the senior-engineer capstone: it rehearses the judgment that separates "changes code" from "evolves a system safely."


2. Treasury topic coverage

Treasury topic Original category Cross-cut
Database migrations + rollback plan Databases CC5
Migration setup Databases CC5
Migration rollback instructions Databases CC5 / CC2
Seed / fixture data for local + integration tests Databases CC5
Review modernization; write ADRs; lead design review with challenge System Design CC5
Deeper TS/Python service design Lang-specific CC5
Architecture via case studies / failure reports Reading Code CC5
20-minute failure drill Recovery CC2
Backup & recovery plan — automatic and manual Recovery CC2
Harden CI + evidence-producing checks + readiness record CI CC3
Signed commits Git CC3
SBOM / dependency scanning CI CC3
Git basics (branching as the evolution substrate) Git CC5

3. Learning objectives

  1. Design a zero-downtime migration using an expand/contract (parallel-change) pattern.
  2. Write a rollback that actually restores prior state, and prove it with a test — not a paragraph of hope.
  3. Seed fixture data so integration tests exercise the migration realistically.
  4. Recover from a failure mid-migration — decide between roll-back and roll-forward under pressure.
  5. Ship the modernised service through a trustworthy pipeline (signed, scanned, evidence-producing).
  6. Record ADRs and survive an adversarial design review.

4. What participants are handed

A legacy application with an evolution problem baked in:

  • A schema that must be reshaped — e.g. split a full_name column, change a one-to-many into a many-to-many, or move a denormalised field into its own table. (Choose a change that needs expand/contract, not a trivial add-column.)
  • No fixture/seed data and a thin integration test suite that can't currently exercise the new shape.
  • A service due for modernization (dead patterns, tight coupling) sitting on top of that schema.
  • An untrustworthy pipeline (the CC3 "before" state from Ship It or Sink It can be reused here).
  • A migration-window fault injector in the harness (drops the DB connection, half-applies a step, spikes load).
  • An in-world case study / post-mortem of a similar migration that went wrong, as required reading.

5. Run of show

Day 1 — Plan & build

Time Phase What happens
0:00–0:45 Briefing + case study Premise, rubric, and read the post-mortem of the migration that failed.
0:45–2:00 Design (CC5) Choose expand/contract; design up and down; write the ADRs.
2:00–4:00 Build the migration Implement up/down; seed fixture data; make integration tests pass on the new shape.
4:00–6:00 Build the trust layer (CC3) Harden CI to gate the modernised service: signed commits, SBOM, evidence record.

Day 2 — Execute & defend

Time Phase What happens
0:00–1:30 Apply the migration Run the zero-downtime cutover against the live-ish instance.
1:30–2:00 The Drill (CC2) A fault is injected during the window. Teams must roll back or roll forward safely — no data loss.
2:00–3:30 Ship the modernised service Through the hardened, evidence-producing pipeline only.
3:30–5:00 Adversarial design review Graders play the hostile reviewer; teams defend their ADRs and recovery plan.

6. Deliverables

  1. A forward migration (up) that applies cleanly with zero downtime.
  2. A rollback (down) with a test proving it restores the prior state.
  3. Seed/fixture data that makes the integration tests pass against the new schema.
  4. A backup & recovery plan (automatic + manual) covering the migration window.
  5. The modernised service, shipped through a signed, SBOM'd, evidence-producing pipeline.
  6. An ADR set + the recorded design-review defense.

7. Scoring rubric (100 points)

Dimension Points What earns full marks
Forward migration correctness 15 up applies cleanly; new shape is correct.
Rollback restores state (tested) 20 down provably returns to the prior state — the heart of the event.
Zero-downtime strategy 10 Expand/contract done properly; no read/write gap.
Seed/fixture data + integration tests 10 Tests exercise the new shape realistically and pass.
The Drill survived 15 Mid-window fault handled; state restored or safely rolled forward; no data loss.
Trustworthy pipeline 15 Modernised service ships only through signed + evidence-producing CI.
ADRs + adversarial review 15 Decisions defensible; the reviewer finds no fatal hole.

Pass = total ≥ 70 AND the rollback verifiably restores state AND no data is lost across the migration. Two hard gates, not just a score — a beautiful design that loses a row fails.


8. Migration scenario bank

Schema change Why it needs expand/contract Drill fault that pairs well
Split full_namefirst / last both shapes must be readable during cutover half-applied step mid-backfill
One-to-many → many-to-many (join table) dual-write window required DB connection drop during dual-write
Denormalised field → own table backfill + read-path switch load spike during backfill
Change PK / re-key references must migrate atomically-ish crash between rekey and FK update

Pick the schema change and a paired drill fault so the Drill genuinely tests the rollback the team designed.


9. Facilitator build checklist

  • Legacy app repo with the chosen reshape-needing schema (from the bank).
  • A live-ish DB instance per team that can take a real cutover and a real rollback.
  • Migration-window fault injector (connection drop / half-apply / load spike).
  • Thin integration suite that fails until fixture data + new shape exist.
  • CI "before" state to harden (reuse from Ship It or Sink It) + reference "after."
  • In-world case study / post-mortem document.
  • (Optional) Gittery migration drills (up/down, rollback ordering) as pre-event warm-up.
  • Per-team checkpoint/reset (clean DB + repo state).

10. Hint laddering & safety nets

  • Stuck on cutover strategy: point them to the case study's failure mode rather than naming expand/contract.
  • Rollback won't restore: allow one checkpoint reset before the Drill (not during) — the Drill itself is the real test and isn't resettable.
  • Pipeline scope creep on Day 1: pin the exact CI/scanner/SBOM tooling so the trust layer stays gradeable.

11. Run-time risks & mitigations

Risk Mitigation
Scope explosion (modernization eats everything) Constrain to one service + one schema change; everything else is out of bounds.
"Good architecture" resists objective grading Two hard gates (rollback restores state, no data loss) make pass/fail concrete.
A team's live DB gets wedged Per-team checkpoint reset available before the Drill.
Day 2 runs long The adversarial review is timeboxed hard; over-run loses defense points.

12. Variants & stretch

  • One-day cut: pre-supply the modernised-service scaffolding; focus purely on migration + rollback + Drill.
  • Hard mode: require the migration to be reversible while live traffic flows (true dual-write window).
  • Chain it: run after Ship It or Sink It and reuse that event's hardened pipeline as the starting trust layer, so teams build on their own prior work.

13. Day-of pre-flight checklist

  • Per-team DB instances provisioned; cutover + rollback rehearsed by facilitators end-to-end.
  • Fault injector verified for the paired drill fault.
  • Integration suite fails-then-passes as intended.
  • Case study, ADR template, rubric shared.
  • Adversarial-review graders briefed with the private reference plan and a list of the holes to probe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment