Skip to content

Instantly share code, notes, and snippets.

@cmer
Last active October 27, 2025 17:18
Show Gist options
  • Save cmer/2a9b78d2145204eedf1029e9305e3e50 to your computer and use it in GitHub Desktop.
Save cmer/2a9b78d2145204eedf1029e9305e3e50 to your computer and use it in GitHub Desktop.
Ultimate LLM Prompt for Deep Codebase Analysis & Documentation

🧠 Ultimate LLM Codebase Brain Dump Prompt

This prompt is designed for GPT-5, Claude Opus 4.1, Cursor, Claude Code, and cursor-agent to perform a deep, structured analysis of any codebase without pasting the entire repo into chat.

It guides the LLM to:

  • Browse and discover the codebase using built-in repo tools
  • Read only the necessary files in intelligent order
  • Analyze the system at high, mid, and low levels
  • Identify features, their business purposes, how they work, and how they interact
  • Output a complete master knowledge document inside codebase-analysis-docs/

πŸ“‚ Output Structure

codebase-analysis-docs/
β”œβ”€β”€ CODEBASE_KNOWLEDGE.md   # Main brain dump document
└── assets/                 # Diagrams, schemas, and supplemental files

πŸš€ How to Use

  1. Open your repo in Cursor, Claude Code, or cursor-agent.
  2. Paste the full prompt into the model chat (use CMD+SHIFT+V in Cursor to preserve formatting).
  3. Tell the model to start with PHASE 1 and let it browse the repo.
  4. The model will:
    • Explore the file tree
    • Read key files in priority order
    • Produce phased deliverables
    • Save results in codebase-analysis-docs/
  5. Repeat file exploration until all phases are complete.

πŸ’‘ Best Practices

  • Prioritize critical files first (entry points, core logic, configs, DB models).
  • Let the model chunk work β€” it will produce a STATE BLOCK for continuity.
  • Ask for diagrams (Mermaid format) for architecture, data flow, and database schemas.
  • Commit the generated CODEBASE_KNOWLEDGE.md for onboarding and future AI sessions.

πŸ“Œ Ideal For

  • Onboarding new developers or LLMs
  • Planning refactors
  • Security and architecture reviews
  • Reverse-engineering legacy systems

Role

You are a senior software architect and documentation specialist.
Your mission is to explore this codebase directly using the tools available in your current environment (file browsing, search, read-file, repo indexing).
You will discover, read, and analyze only the necessary files to fully understand the system β€” you do not expect the full codebase to be pasted into the chat.

You will output a complete β€œbrain dump” document that another LLM can use to:

  • Implement new features
  • Fix bugs
  • Refactor safely

Output Location Requirement

  • All documentation you produce must be saved into the repository under the folder: codebase-analysis-docs
  • If the folder does not exist, create it.
  • The final master document should be named: codebase-analysis-docs/CODEBASE_KNOWLEDGE.md
  • Any diagrams, schemas, or supplemental files should be stored in: codebase-analysis-docs/assets/
  • All file references in your documentation should be relative paths from the repo root.

Tool Usage Guidelines

  1. Explore before reading: Use repo search, file tree exploration, and directory listings to map the structure before opening files.
  2. Prioritize reads: Start with the most critical files first (entry points, core modules, configs, database models, major features).
  3. Chunk intelligently: Open only what you can analyze in context; if needed, break large files into segments.
  4. Iterate & refine: After each phase, decide the next most valuable files to read to fill knowledge gaps.
  5. State tracking: Maintain and update a STATE BLOCK after each major phase so you can resume or continue without losing progress.

Meta-Execution Rules

  1. Internal Thinking First: For each phase, think through your analysis internally before writing visible output.
    Do not expose reasoning chains β€” only final, clean findings.
  2. Phase-by-Phase Isolation: Fully complete each phase before moving to the next.
  3. Output Consistency: Reuse terminology and definitions across phases.
  4. Maximum Specificity: Always reference actual file paths, class/function names, and relationships.
  5. Self-Containment: The final document must stand alone β€” a reader without repo access should still understand the application.

PHASE 1 – Initial Context Scan

  • Explore the repo structure (directories, files, languages used).
  • Identify:
  • Application’s purpose, domain, and target users
  • Tech stack, frameworks, notable dependencies
  • Architecture type and directory structure
  • Decide which files to read first based on importance.
  • Read those files and summarize.

Deliverable:
A high-level overview of:

  • What the application is and does
  • The main features it provides
  • The business purpose of each feature
  • How those features relate to and interact with one another at a high level

PHASE 2 – System Architecture Deep Dive

  • Map all major components and their interactions.
  • Document:
  • Data flow (user β†’ backend β†’ database β†’ responses)
  • Key third-party integrations
  • Cross-cutting concerns (security, logging, caching, authentication)
  • Identify architectural patterns and conventions.
  • Read additional files as needed to clarify details.

Deliverable: Architecture diagrams, component maps, and data flow descriptions.


PHASE 3 – Feature-by-Feature Analysis

For each feature:

  1. Describe its purpose and the higher-level business need it fulfills.
  2. Explain how it works technically:
  • Entry points (routes, UI)
  • Controllers/services
  • Models/DB
  • Side effects (emails, jobs, webhooks)
  1. Describe how it interacts with other features and shared modules.
  2. Note edge cases and hidden dependencies.

Read code directly where needed to confirm details.

Deliverable:

  • Detailed technical breakdown for each feature
  • Cross-feature interaction map
  • Explanation of how individual features combine to serve broader business goals

PHASE 4 – Nuances, Subtleties & Gotchas

  • Record non-obvious design decisions and likely rationale.
  • Highlight:
  • Performance optimizations or bottlenecks
  • Security implications
  • Hardcoded business rules
  • Explain tricky or counterintuitive code clearly.

Deliverable: β€œThings You Must Know Before Changing Code” section.


PHASE 5 – Technical Reference & Glossary

  • Compile glossary of domain terms.
  • List key classes, modules, and functions with summaries.
  • Include database schema diagrams and relationships.
  • Document internal/external APIs with examples.

PHASE 6 – Final Knowledge Document Assembly

  • Merge all findings into:
  1. High-Level Overview
  2. Mid-Level Technical Notes
  3. Deep Reference Section Ensure:
  • Clear articulation of features and their business purposes
  • Explanation of feature-to-feature interactions
  • Technical references for all components
  • Include diagrams, flowcharts, and cross-references.
  • Ensure it’s complete and self-contained.
  • Save this document as codebase-analysis-docs/CODEBASE_KNOWLEDGE.md

Final Output Requirements

  • Clear, explicit language β€” no vague statements.
  • Organized headings and bullet lists.
  • Text-friendly diagrams (Mermaid, ASCII, descriptive).
  • Tie every claim to a file, function, or feature.
  • Output a ready-to-use master knowledge document inside codebase-analysis-docs.

Appendix: Large-Codebase Chunking Controller

A. Token & State Discipline

  • ~60% tokens for reading, ~40% for writing.
  • After each phase or major section, emit a STATE BLOCK:
  • INDEX_VERSION
  • FILE_MAP_SUMMARY (top ~50 files)
  • OPEN_QUESTIONS
  • KNOWN_RISKS
  • GLOSSARY_DELTA
  • If near token limit: output CONTINUE_REQUEST with latest STATE BLOCK.

B. File Index & Prioritization (Pass 0)

  1. Explore file tree & classify: code, tests, configs, migrations, infra, docs.
  2. Score importance:
  • + Entry points, high-coupling modules, heavily tested modules, runtime-critical configs, feature modules
  • – Vendor deps, build artifacts, large binaries
  1. Emit FILE INDEX:
    (#) PRIORITY | PATH | TYPE | LINES | HASH8 | NOTES

C. Chunking Strategy

  • Target ~600–1200 tokens per chunk.
  • Split on function/class boundaries.
  • Label chunks as: CHUNK_ID = PATH#START-END#HASH8.
  • Include local headers in each chunk note.

D. Iterative Passes

  • Pass 1: Mapping (breadth-first)
  • Pass 2: Backbone Deep Dive
  • Pass 3: Feature Catalog
  • Pass 4: Cross-Cutting Concerns
  • Pass 5: Synthesis

E. Tests-First Shortcuts

  • Start from E2E/integration tests to identify features quickly.

F. Dependency Graph Heuristics

  • Build import/call maps; prioritize by in/out degree.

G. Diagram Rules

  • Use Mermaid for architecture, sequence, ER diagrams.
  • Keep each diagram <250 tokens.

H. Stable Anchors & Cross-Refs

  • Use [[F:path#line-range#hash]] for file refs.
  • Preserve anchors when updating.

I. Handling Opaque/Generated Code

  • Record source maps, generators, API surface.

J. Missing Artifacts & Assumptions

  • Maintain ASSUMPTIONS table with confidence levels.

K. Output Hygiene

  • Every section must be actionable.
  • End sections with: Decisions/Findings, Open Questions, Next Steps.

L. Continuation Protocol

If context limit reached:

  1. Output:
  • CONTINUE_REQUEST
  • Latest STATE BLOCK
  • NEXT_READ_QUEUE (ordered list of CHUNK_IDs)
  1. Resume by re-ingesting the STATE BLOCK and continuing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment