Source: This is a summary of a YouTube video Nate B Jones (ex-Amazon). Watch the original video
As AI tools generate ever more production code at speed, a new class of problem has emerged: "dark code" - code that works, passes tests, and ships, but that no human on the team ever truly understood. This video argues that dark code is not primarily a security or tooling problem, but an organizational capability problem, and lays out a three-layer framework for fighting it before it becomes a liability.
Dark code is AI-generated code that was never comprehended by a human at any point in its lifecycle. It's distinct from buggy code, spaghetti code, or technical debt. The comprehension step simply never happened - not from carelessness, but because modern shipping processes no longer require it. The problem touches regulatory exposure (SOC 2, encryption), business liability, and organizational capability.
Two forces compound each other:
- Structural: AI-written code is harder to understand because you didn't write it yourself. Without discipline around non-functional requirements, comprehension degrades.
- Velocity: AI enables - and the industry demands - faster shipping. Speed and structural opacity combine to decouple authorship from comprehension.
Three common responses, and why they fall short:
- Observability/telemetry: Lets you measure what dark code breaks in production, but doesn't eliminate the dark code itself.
- Better agentic pipelines: Adds guardrails and reduces risk, but adds layers to troubleshoot - not comprehension.
- Accepting dark code (the "YOLO" approach): Companies like Factory.ai attempt this with extraordinary eval discipline, but most orgs lack that rigor and end up with diffuse ownership and no one accountable for the full production codebase.
The stronger AI models become, the easier it is to rationalize "the AI will understand and fix its own code." But AI overconfidence is hard to detect. Even AI-native companies (Anthropic, OpenAI) still require engineers to commit PRs, conduct code reviews, and maintain comprehension - they invest heavily in evals, agentic pipelines, and human legibility simultaneously.
Reducing engineering headcount while expecting remaining engineers to handle more AI-generated output directly worsens dark code. Fewer eyes on more inscrutable code is a compounding risk, not a productivity gain.
Force understanding before the code exists. Not heavyweight 2010s-era documentation, and not unrestricted vibe coding either - the discipline is: write down what you want to build in enough detail that you can articulate it clearly, then generate. The spec becomes the eval; a well-written spec is what you run the agent against to verify correctness. Amazon rebuilt their internal coding tool (Kiro) around exactly this principle after a costly December outage.
Embed comprehension into the codebase itself, not in people's heads. Three sub-layers:
- Structural context (answers where): Every module has a manifest describing what it does, what it depends on, and what depends on it.
- Semantic context (answers what): Interfaces carry rules of engagement - performance expectations, failure modes, retry semantics, behavioral contracts - not just data shapes.
- Eval-driven catch-all: Tests covering both functional and non-functional requirements close the gaps the first two layers miss.
Before code ships, run it through the questions a senior or principal engineer would actually ask:
- Why did you call that dependency here?
- Why is the cache in a location other services can't read?
- How are you thinking about separation of concerns in this monolithic structure?
These questions can be codified into an AI-assisted comprehension filter that (1) makes code legible and accountable to the organization, and (2) feeds a flywheel that improves eval quality and code quality over time.
- Dark code is an organizational problem, not a tooling, security, or observability problem alone - those are table stakes, not solutions.
- Authorship and comprehension have decoupled under AI velocity. This must be actively re-coupled through process discipline.
- Spec-driven development is the most effective upstream intervention: write the spec → generate the code → use the spec as the eval.
- Self-describing codebases reduce reliance on institutional knowledge locked in individuals' heads, making code legible to both humans and future AI agents.
- Comprehension gates based on senior engineer heuristics create a scalable review layer that grows code quality over time, not just ship velocity.
- Laying off engineers while shipping more AI-generated code directly worsens the dark code problem.
For engineering leaders:
- Audit whether your team has mechanisms to make AI-generated code legible before it ships - not just observable after it fails.
- Implement spec-driven development: require a written spec before code generation begins.
- Codify the questions your best senior engineers ask into a comprehension gate checklist or AI-assisted review skill.
For senior / principal engineers:
- Develop AI-assisted comprehension tools to maintain accountability across higher PR volumes - manual-only review is no longer scalable.
- Customize comprehension prompts and lenses to match your standards, then systematize them for your team.
For founders and early-stage teams:
- Know your code. Transparency about trade-offs and legibility is a competitive differentiator and a trust-builder with customers and vendors.
- Ask vendors about their dark code practices - how much of what they ship do they actually understand?
For early-career developers:
- Use comprehension gates as a learning accelerator: systematically practicing senior engineer questions on AI-generated code builds expertise faster than generating code alone.