Hackathon Design Document — "Ship It or Sink It"

Fuses: CC2 (Incident & Resilience) × CC3 (Trustworthy Pipeline) Companion doc: cross-cutting-hackathons.md §10.1 Recommended slot: the first flagship blended event to run (per the sequencing in §12).

Attribute	Value
Codename	Ship It or Sink It
Format	Brownfield break-fix + pipeline hardening
Duration	1 day (≈7 working hours)
Team size	2–4
Difficulty	●●●○○
Native platform	New brownfield harness (primary) + Gittery checkers (pipeline half)
Pass bar	Total ≥ 70 and the surprise failure drill is survived

1. Premise

A service has been thrown over the wall to your team in a bad state. It is buckling under load and nobody can trust how it gets to production — the CI is decorative, commits are unsigned, and there's a vulnerable dependency lurking in the tree. You have one day to make it survive and make it shippable, in that order. At the end of the day we will try to break it again, live, and you'll have to recover from the plan you wrote.

The fiction matters: frame it as a real handover from a team that has left. Participants are the new on-call.

2. Treasury topic coverage

Treasury topic	Original category	Cross-cut
Diagnose injected latency / bad config / broken migration / high CPU	Mixed	CC2
20-minute failure drill	Recovery	CC2
Backup & recovery plan — automatic and manual	Recovery	CC2
Load / stress tests with realistic scenarios	Benchmarking	CC2
Migration rollback instructions	Databases	CC2
Harden CI workflow + evidence-producing checks + readiness record	CI	CC3
Triage simulated scan report	CI	CC3
SBOM / dependency scanning	CI	CC3
Signed commits	Git	CC3
Build cache / incremental builds	Caching	CC3

3. Learning objectives

By the end, participants should be able to:

Diagnose a degrading system under time pressure using a hypothesis log rather than guesswork.
Write and execute a recovery runbook with both automatic and manual fallbacks.
Prove a fix holds with a realistic load/stress test, not an assertion.
Harden a CI pipeline so it produces evidence — a repository-readiness record a stranger could trust.
Triage security findings correctly (critical/high/medium/false-positive) and read an SBOM.
Enforce provenance so an unsigned or unscanned change cannot reach main.

4. What participants are handed

A small but realistic service — e.g. a TypeScript or Python REST API backed by a relational DB — in a repo that contains, by design:

A weak CI workflow (runs tests, produces no evidence, no gating).
Unsigned commits permitted on the default branch.
A vulnerable transitive dependency and no SBOM.
No build cache (every run rebuilds from cold).
A fault-injection control panel (the harness) the facilitators drive — invisible to the obvious code path.
A seeded but thin test suite that passes green at start (so the breakage is environmental, not test-visible).

Provide: repo access, the running service URL, a metrics/logs endpoint, and a one-page "handover note" written in-character that is deliberately incomplete.

5. Run of show

Time	Phase	What happens
0:00–0:30	Briefing	Premise, rules, rubric walkthrough, harness orientation.
0:30–1:00	Recon	Teams read the code, metrics and handover note. No fixing yet — produce a hypothesis log.
1:00–3:00	Act I — Survive (CC2)	A fault is injected at 1:00. Diagnose → fix → write the recovery runbook → prove with a load test.
3:00–3:45	Lunch / buffer	Harness stays up; no scoring.
3:45–6:00	Act II — Trust (CC3)	You may not "ship" until: hardened CI produces an evidence record, SBOM is generated and the vulnerable dep is triaged, signed commits are enforced, build is incremental.
6:00–6:30	The Drill	A new, unseen fault is injected. Teams have 20 minutes to restore service using their runbook.
6:30–7:00	Readout	Each team presents their readiness record + runbook; facilitators score live.

6. Deliverables

A passing service with the Act I fault resolved.
A recovery runbook (one page): detection signal → diagnosis → containment → recovery (automatic and manual fallback) → verification.
A load/stress test + its result, demonstrating the fix holds under realistic traffic.
A hardened CI workflow that emits a repository-readiness record (the evidence artifact).
An SBOM + a short scan-triage table classifying each finding and naming the action taken.
Signed-commit enforcement proven (an unsigned commit is rejected).

7. Scoring rubric (100 points)

Dimension	Points	What earns full marks
Diagnosis speed & method (Act I)	20	Fault correctly identified; hypothesis log shows what was ruled out and why.
Recovery runbook quality	15	Both automatic and manual paths; a stranger could execute it cold.
Load test proves the fix	10	Realistic scenario; fix demonstrably holds; honest methodology.
CI produces the evidence record	20	Readiness record is complete, gated, and regenerated on every run.
SBOM + scan triage	15	SBOM generated; vulnerable dep found; findings correctly classified incl. the false positive.
Signed-commit enforcement	10	Unsigned commit rejected by policy + CI.
The Drill	10	Service restored within the 20-minute window by following the team's own runbook.

Pass = total ≥ 70 AND the Drill is survived (service restored within the window). Surviving the Drill is a gate, not just points — a team can score well on paper and still fail if their runbook was fiction.

8. Fault bank (Act I + the Drill)

Pick one for Act I and a different one for the Drill, ideally a class the team didn't see:

Fault	Symptom	Honest diagnosis path
Injected latency on a downstream call	p95 climbs, throughput craters	trace → find the slow dependency → add timeout + cache/circuit-break
Pinned CPU (hot loop / N+1)	CPU at 100%, requests queue	profile → find the hot path → fix the query/loop
Corrupted config value	intermittent 5xx on one route	diff config → spot the bad value → restore from known-good
Half-applied migration	schema mismatch errors	inspect migration state → roll back → re-apply cleanly

The half-applied migration is the natural CC2↔CC5 bridge if you want to foreshadow "Mid-Flight Engine Swap."

9. Facilitator build checklist

Brownfield service repo with the six planted weaknesses (§4).
Fault-injection control panel that can toggle each fault in the bank without code changes.
A pinned, known-vulnerable transitive dependency + a chosen SBOM format.
A simulated scan report containing one true-positive-critical and at least one false-positive.
CI scaffolding the teams can harden (the "before" state) + a reference "after" for grading.
(Optional reuse) Gittery checkers for the pipeline deliverables, so Act II auto-grades.
A load-test target profile (realistic request mix) so results are comparable across teams.
A reset button per team (restore repo + service to a clean checkpoint).

10. Hint laddering & safety nets

Stuck on diagnosis > 30 min: release Hint 1 (which subsystem the metric points at), then Hint 2 (the fault class) at 45 min. Never reveal the fix.
One team hard-blocked: offer a one-time reset to a clean checkpoint; they keep any pipeline work already committed.
Act II tooling sprawl: pin exact CI system, scanner and SBOM tool in the brief so grading stays uniform.

11. Run-time risks & mitigations

Risk	Mitigation
Time pressure rewards luck over method	Rubric rewards the hypothesis log, not just the fix.
A single blocker stalls a team all day	Reset checkpoints + laddered hints.
"Pipeline plumbing" feels joyless	Anchor it in the threat: the forged commit, the compromised dependency.
Drill becomes chaos	The Drill only injects a fault whose class the team has already practised mitigating.

12. Variants & stretch

Half-day cut: drop the load test and the build-cache requirement; keep diagnosis + runbook + evidence + drill.
Hard mode: inject the Drill fault during Act II so resilience and trust work overlap.
Async pre-work: run the Gittery CC3 drills (CI hardening, signed commits, scan triage) as a warm-up the week before, so Act II goes faster and deeper.

13. Day-of pre-flight checklist

Harness healthy; all faults toggle cleanly; reset verified.
Each team has repo access, service URL, metrics endpoint, handover note.
Rubric + readiness-record template shared.
Drill fault chosen (different class from Act I).
Graders briefed; live-scoring sheet ready for the readout.

decagondev/shipit-or-sinkit.md

Select an option

No results found