Skip to content

Instantly share code, notes, and snippets.

@binhngoc17
Created February 15, 2026 23:29
Show Gist options
  • Select an option

  • Save binhngoc17/1f5e54951c32653dee2fe8ecb509f110 to your computer and use it in GitHub Desktop.

Select an option

Save binhngoc17/1f5e54951c32653dee2fe8ecb509f110 to your computer and use it in GitHub Desktop.
---
title: "Spec-Driven Development: The Guide for AI-Augmented Teams"
slug: "spec-driven-development"
publishedAt: ""
updatedAt: ""
author: "nathan"
category: "guides"
status: written
---
Adoption of AI coding tools surged through 2025 — [84% of developers](https://survey.stackoverflow.co/2025/) were using or planning to use them, up from 76% the year before. Satisfaction went the other direction. Positive sentiment dropped from above 70% in 2023-2024 to 60%.
More developers building with AI. Fewer developers happy with what they shipped.
The pattern was familiar to anyone who'd tried it. Most teams prompted agents like chatbots: "Build me an auth system." "Add a payment webhook." The agent produced code fast — sometimes correct, often close-but-wrong in ways that surfaced during review or, worse, in production. AI-generated code that compiles isn't the same as code that's correct, and the time saved writing it got eaten by reviewing and fixing it.
Spec-driven development (SDD) closes this gap. You write a structured specification first — what to build, why, and within what constraints — then hand the spec to an AI agent for implementation. The reviewer checks code against the spec instead of guessing at intent. The spec becomes the contract between human decision-making and machine output.
This guide covers what SDD is, how the workflow operates, where the major tools stand, and when the methodology isn't worth the overhead.
## What Spec-Driven Development Actually Means
### The Core Idea
Spec-driven development uses formal, structured specifications to guide AI code generation instead of ad-hoc prompting. The concept is older than most people realize. SDD was [first formalized in 2004](https://en.wikipedia.org/wiki/Spec-driven_development) as a synthesis of test-driven development and design by contract. For two decades it stayed academic — writing detailed specifications took longer than writing the code they described.
LLM-powered coding agents changed the economics. Agents generate hundreds of lines of implementation from a natural-language spec in minutes. The spec no longer slows you down; it's the fastest path to correct output. Before agents, specifications were optional documentation for humans. Now they're operational instructions for machines.
SDD occupies distinct ground from both vibe coding and traditional waterfall. Vibe coding skips specifications entirely — you prompt, the agent generates, you accept or iterate. Waterfall writes specifications once and hands them to a separate implementation team over months. SDD iterates specifications alongside implementation in tight feedback loops, with the same people (or agents) handling both.
### The SDD Spectrum
Not all spec-driven approaches work the same way. [Birgitta Böckeler](https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html), a principal technologist at Thoughtworks, defines three levels that help make sense of the growing tool ecosystem:
**Spec-first** means writing documentation before coding within a single workflow cycle. You specify the feature, the agent implements it, and the spec may or may not survive past that initial build. This is where most teams operate today. Amazon Kiro, GitHub Spec Kit, and Claude Code all support this pattern.
**Spec-anchored** goes further. Specifications persist through feature evolution and maintenance. When you change the code, you update the spec. When you revisit a feature six months later, the spec reflects the current state — not just the original intent. Spec Kit aspires to this level with its "constitution" pattern, though real-world adoption at this depth is still early.
**Spec-as-source** is the radical endpoint. Specifications become the primary artifact; code is generated and regenerable. Tessl is the only major tool explicitly pursuing this approach, marking generated files with `// GENERATED FROM SPEC - DO NOT EDIT`. Conceptually elegant. Unproven at scale.
The practical takeaway: spec-first delivers most of the value on its own. You don't need to commit to spec-as-source to benefit from writing a spec before prompting your AI agent.
## Why Specs Matter Now
Three forces converged in 2025 to push SDD from academic idea to practical methodology.
**AI agents got capable enough to be dangerous.** Modern agents handle multi-file, multi-step tasks — generating routes, services, tests, and migrations in a single session. That capability becomes a liability without direction. Agents hallucinate APIs, miss security requirements, and make architectural assumptions that look reasonable in isolation but conflict with the rest of the codebase. GitHub [analyzed 2,500+ agent configuration files](https://addyosmani.com/blog/good-spec/) and found that the most effective setups share one trait: structured specifications, not freeform prompting.
**Vibe coding exposed the failure mode.** Andrej Karpathy coined the term in February 2025 to describe prompting AI and accepting whatever it produces. The concept went viral because developers recognized themselves in it. Karpathy himself later described SDD as "the limit of the imperative-to-declarative transition — basically being declarative entirely." The move from vibe coding to spec-driven isn't a correction. It's the natural next step: once you accept that AI writes the code, you focus on defining what the code should do.
**The industry shipped tools.** Amazon launched Kiro with a built-in spec workflow. GitHub released Spec Kit as open source — it hit 50,000 stars within months. Tessl entered closed beta with a spec registry of 10,000+ library specifications designed to prevent agent hallucinations. The [Thoughtworks Technology Radar](https://www.thoughtworks.com/en-us/radar/techniques/spec-driven-development) placed SDD in its "Assess" ring — worth exploring before it becomes standard.
## The Same Feature, Two Ways
Theory matters less than what happens when you sit down to build something. Here's the same task — adding a Stripe webhook handler to an Express API — done two ways.
### The Vibe Coding Attempt
The developer prompts their agent: "Add a Stripe webhook endpoint that handles payment_intent.succeeded events and updates the order status."
The agent generates a route handler in under a minute. It parses the event, queries the order, updates the status, returns 200. Looks right.
The PR reviewer flags three problems. No signature verification — any HTTP POST to that endpoint gets treated as a legitimate Stripe event. No idempotency handling — when Stripe retries the webhook (and it will), the order processes twice. And the database update runs synchronously in the request handler, which will timeout under load.
Back to the agent. "Fix signature verification." Done. "Add idempotency." Done — using an in-memory set that resets on every deploy. "Use the database for idempotency keys." Now correct, but the PR is on its third review round. Each fix addressed the latest comment and introduced assumptions about the next unspecified requirement.
Three review cycles. Working code, eventually. Every decision made reactively during review instead of proactively during design.
### The Spec-Driven Approach
Same task. The developer writes a short spec first:
```markdown
## Stripe Webhook Handler
### Requirements
- Handle `payment_intent.succeeded` events
- Verify Stripe webhook signatures using endpoint secret
- Process events idempotently (database-backed key store)
### Constraints
- Respond within 5 seconds (queue heavy processing)
- Log all received events before processing
- Return 200 even if downstream processing fails (retry via job queue)
### Acceptance Criteria
- [ ] Signature verification rejects invalid payloads
- [ ] Duplicate event IDs are detected and skipped
- [ ] Order status updates are queued, not synchronous
```
The agent implements from this spec. Signature verification is in the first middleware. Idempotency uses a database table. Processing goes through a job queue. The reviewer checks code against the spec: signatures verified? Yes. Idempotency database-backed? Yes. Processing async? Yes.
One review round. Merged.
### What Changed
The spec didn't make the AI smarter. It made the developer's decisions explicit before the agent started working. Signature verification, the idempotency strategy, async processing — these are decisions that would have surfaced during review regardless. The spec moved them upstream, where they cost minutes to write instead of hours to discover through three rounds of back-and-forth.
Time spent writing a spec is time you'd spend in code review anyway. You're not adding work. You're moving it earlier, where it's cheaper.
## How the Workflow Works
"SDD is about making your technical decisions explicit, reviewable, and evolvable," as Den Delimarsky, a principal product engineer at Microsoft, puts it. The workflow that achieves this follows four phases.
### Specify
Write what the system should do and why. Include requirements, constraints, scope boundaries, and acceptance criteria in natural language. Many teams add a "constitution" — a set of immutable principles that apply across all specifications. Think of it as a project-wide rules file: "All APIs are REST." "Never commit secrets." "Tests required for all new endpoints." The constitution prevents the same decisions from being re-debated in every spec.
### Plan
Translate the specification into a technical approach. Architecture decisions get captured with reasoning — not "use PostgreSQL" but "use PostgreSQL because the existing schema is relational and the team has operational experience with it." The plan becomes a reviewable contract between product intent and engineering approach. Some teams have the AI agent generate the plan; others write it themselves. Either way, a human approves before implementation starts.
### Tasks
Break the plan into discrete, verifiable units of work. Each task should be small enough for a single AI agent session. [JetBrains recommends](https://blog.jetbrains.com/junie/2025/10/how-to-use-a-spec-driven-approach-for-coding-with-ai/) organizing tasks with checkbox tracking, phase grouping, and bidirectional links — every task references the plan item it implements and the original requirement it traces back to. When something breaks, you follow the chain from bug to task to plan to spec.
### Implement
The AI agent works through tasks one at a time with bounded scope. Human review shifts from "what was the developer trying to do?" to "does this code match the spec we agreed on?" When the reviewer finds a gap between code and spec, the response is to update both — not patch the code and leave the spec behind. A specification that doesn't reflect reality is worse than no specification at all.
## Where the Tools Stand
More than 15 major AI coding platforms launched between 2024 and 2025, and most now support some form of spec-driven workflow. The tools differ in how far they push the concept.
**GitHub Spec Kit** is open source and agent-agnostic. It structures specs across multiple markdown files — a specification (what and why), a plan (how), and a task list (execution steps). Its defining feature is the "constitution" pattern: non-negotiable project principles that every spec inherits. Spec Kit hit 50,000 GitHub stars within months of its September 2025 launch.
**Amazon Kiro** bakes SDD into the IDE. Built on VS Code's open-source foundation, Kiro guides developers through requirements, design, and task creation as a native part of the coding experience. Its "hooks" system triggers automated checks when files change, enforcing spec compliance without manual intervention.
**Tessl Framework** takes the most aggressive position. Tessl treats specifications as the primary artifact and code as a generated byproduct, marking output with `// GENERATED FROM SPEC - DO NOT EDIT`. Its spec registry holds 10,000+ specifications for external libraries, solving a persistent pain point: agents hallucinating API signatures for packages they don't understand well. Tessl is in closed beta.
**Claude Code** supports spec-first workflows through Plan Mode, CLAUDE.md project-level specifications, and subagent patterns that route different spec concerns to specialized agents.
**SpecLedger** adds a collaboration and memory layer for spec-driven teams. It tracks spec checkpoints and deltas across sessions, indexes AI session history for organizational memory, and supports multi-agent workflows across multiple repositories. Where other tools focus on the single-developer spec-to-code loop, SpecLedger focuses on what happens when teams of humans and agents need to stay aligned over time.
No single tool owns the SDD workflow. Most teams mix approaches — Spec Kit's constitution pattern with Claude Code's CLI, or Kiro's IDE integration with SpecLedger's session memory. The workflow pattern matters more than the vendor.
## When SDD Works — and When to Skip It
Is every feature worth a spec? No.
The critics raise real points. Böckeler noted that large context windows don't guarantee agents will follow all instructions — more spec doesn't automatically mean better output. François Zaninotto, CEO of web agency Marmelab, makes a sharper case: "Software development is fundamentally a non-deterministic process." He documented Kiro generating 8 files and 1,300 lines of specification markup for a feature that displays a date.
Both observations land. SDD is not universally better. The question is when the tradeoff tips in its favor.
### Where SDD Delivers
**Production code that teams depend on.** When a bug costs real money or user trust, time spent writing a spec pays for itself in reduced rework and fewer review cycles.
**Team coordination.** Specs give multiple developers (and multiple agents) a shared definition of "done." Without one, each person fills ambiguity with their own assumptions — and those assumptions diverge.
**Regulated industries.** Finance, healthcare, and government need audit trails for code provenance. Specs create that traceability: this code exists because this requirement was specified, planned, and approved.
**Brownfield projects, incrementally.** You don't spec your entire legacy codebase. You spec the next feature. The next bug fix that touches a critical path. Build spec coverage over time the same way you'd build test coverage — starting where the risk is highest.
### Where SDD Gets in the Way
**Prototyping.** If you're still figuring out what to build, a spec slows you down. Vibe code the prototype. Figure out what works. Write a spec for the production version.
**Small, obvious changes.** A typo fix doesn't need a spec. Neither does a dependency bump or a one-line config tweak.
**Over-specification.** Zaninotto's 1,300-line example is a real failure mode. When the spec takes longer to write (or review) than the implementation, the tool needs calibration — or you should skip the spec entirely for that task.
### Where This Stands
SDD isn't a religion. It's a tool that works when the cost of misalignment exceeds the cost of specification. The one-line heuristic: **use SDD when being wrong is expensive.**
Model-Driven Development promised a similar vision in the 2000s — generate code from models — and didn't deliver. The difference: MDD required rigid UML diagrams that nobody wanted to write. SDD works in natural language developers already think in. That distinction matters more than it sounds.
The field is still young. A McKinsey survey of nearly 300 publicly traded companies found that 70% haven't changed role definitions despite AI adoption. The tooling is ahead of the organizational change. Expect both to keep shifting through 2026.
## Getting Started
Pick one feature on your next sprint. Before prompting an AI agent, spend 15 minutes writing a spec: what the feature does, why it exists, what constraints apply, and how you'll know it's done. Then compare the result — review cycles, rework, final quality — to how you normally work.
You don't need specialized tooling to start. A markdown file with requirements, constraints, and acceptance criteria gets you most of the benefit. Structure it as: Specify (what and why), Plan (how), Tasks (execution checklist), Implement (agent executes, human reviews against spec).
Add a constitution early. Five to ten non-negotiable principles for your project — security policies, code style, testing requirements, API conventions — that apply to every feature. Reference it in every spec so the agent inherits those constraints without you restating them each time.
When you're ready for dedicated tools: Kiro for IDE-native workflows. Spec Kit for open-source, agent-agnostic scaffolding. SpecLedger for multi-agent collaboration and session memory across repositories. Claude Code for CLI-first spec workflows.
The most important step is the first spec. Write one. See what changes.
## What Comes Next
SDD makes AI coding agents reliable, not just fast. It turns "the agent wrote something" into "the agent built what we specified."
The field is early. Thoughtworks has SDD in their Assess ring — worth exploring, not yet standard. Tools will mature. Workflow patterns will consolidate. Teams that build spec-driven habits now will ship faster and with fewer surprises than those still prompting production features off two-sentence descriptions.
The open question is how far specs go. Tessl is betting that specifications will eventually replace code as the primary artifact — that developers will maintain specs, not implementations. Most teams aren't ready for that, and the tooling isn't there yet. But the direction is visible: we're writing less code and more intent. Specs are how intent gets precise enough for machines to execute.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment