Skip to content

Instantly share code, notes, and snippets.

@wilmoore

wilmoore/prd.md Secret

Last active January 17, 2026 04:20
Show Gist options
  • Select an option

  • Save wilmoore/f4609ac97c9b8117d1efa2c79d733fe0 to your computer and use it in GitHub Desktop.

Select an option

Save wilmoore/f4609ac97c9b8117d1efa2c79d733fe0 to your computer and use it in GitHub Desktop.
Business :: Ideas :: StoryBeats

STORYBEATS Product Requirements Document Version: Final Canonical Date: 2026-01-04


  1. INTERNAL MANIFESTO Why StoryBeats Exists

AI video generation is not hard. Learning what to generate is hard.

Most AI video tools charge users full price while they are still confused. That is backwards.

We refuse to:

  • make failure expensive
  • punish exploration
  • watermark previews
  • hide learning behind credits
  • require external editors just to finish work
  • force users to think like pipeline engineers

If a user runs out of credits before they know what they want, the product has failed.

Principles:

  1. Iteration must be cheaper than intention. Exploration is expected. Exploration should feel safe.

  2. Assembly is the real work. Short clips are atoms, not products. The product is the stitched story.

  3. The system must explain itself. When something fails, we name the cause and propose fixes.

  4. Voice expresses intent, structure ensures sanity. Users talk. The system constrains. The system never becomes a chatbot.

  5. Timeline is an implementation detail. Storyboards are the interface. Timelines stay hidden.

Hard lines:

  • No watermarks on previews. Ever.
  • No raw prompt boxes.
  • No silent global changes.
  • No “try again” without a causal explanation.
  • No configuration hell.

A. NAMING RATIONALE AND CANONICAL TERMINOLOGY

A.1 Product Name Rationale

The product is named StoryBeats because the core unit of interaction is a beat.

A beat represents a single, intentional narrative moment. Storyboarding, teaching, filmmaking, comedy, and animation all rely on beats to structure meaning, pacing, and progression.

The name intentionally:

  • anchors the product in storytelling rather than generation
  • reflects the system’s beat first architecture
  • resonates with users who already think in narrative units
  • remains elastic enough to support future sound and music layers without renaming

Some users may associate “beats” with music. This is considered acceptable and directionally correct. Music is treated as a future, beat aware layer that follows visual structure rather than precedes it.

StoryBeats prioritizes: story, visuals, sound. In that order.

A.2 Canonical Definition of a Beat

In StoryBeats, a beat is defined as:

A single narrative moment that communicates one idea, action, or transition in a story.

A beat is not:

  • a prompt
  • a clip
  • a shot specification
  • a musical bar
  • a timeline segment

A beat is:

  • the smallest unit of meaning
  • independently previewable
  • independently regenerable
  • composable into a finished story

This definition is canonical across product, UX, and engineering.


  1. PRODUCT SUMMARY

StoryBeats is a voice first, storyboard first AI video creation system designed to make learning, iteration, and assembly cheap, fast, and inevitable.

Unlike existing AI video tools that optimize for single shot generation, StoryBeats optimizes for human learning loops:

  • fast previews
  • constrained iteration
  • enforced consistency
  • automatic assembly
  • guided correction
  • predictable cost

StoryBeats treats AI video not as a prompt gamble, but as a structured creative process that users can understand, control, and finish.


  1. PROBLEM STATEMENT

Current AI video tools:

  • make failure expensive in time and credits
  • punish exploration
  • require users to understand hidden constraints
  • expose fragile pipelines
  • force external editing and stitching
  • provide no explanation when outputs fail

Users routinely burn credits before understanding what they want, wait minutes for unusable results, and abandon tools out of frustration.

This is not a model problem. It is a product design failure.


  1. GOALS, NON GOALS, AND ANTI FEATURES

3.1 Goals

  • Voice driven creation of multi beat storyboards
  • Fast beat previews to enable rapid iteration
  • Consistency enforcement for environments and characters
  • Guided correction with causal explanations
  • One click assembly with sane defaults
  • Model agnostic execution via profiles

3.2 Non Goals for V1

  • Full timeline editors
  • Fine grained motion curves
  • Audio and music layers
  • Collaboration and sharing
  • Custom model parameter tuning
  • Branching narratives

3.3 Explicit Anti Features The following are intentionally excluded:

  • free form timelines
  • node graphs or pipeline editors
  • raw prompt text areas
  • blind “variations” buttons
  • watermarked previews
  • credit based gating during learning
  • silent global state changes
  • exporting just to see flow

These are not missing features. They are explicitly rejected.


  1. USER TRUST CONTRACT

StoryBeats guarantees:

  • previews are never watermarked
  • preview generation never consumes final render quota
  • regenerating one beat never mutates other beats
  • global changes are never applied without confirmation
  • expensive operations always require explicit approval
  • every failure includes a cause and suggested fixes

Violating these guarantees is a product failure, not a UX bug.


  1. TARGET USERS

Primary:

  • beginners and intermediates exploring AI video
  • educators, founders, creators, students
  • users who do not know prompt engineering
  • users who want to learn by doing

Secondary:

  • power users exhausted by ComfyUI
  • users who value speed and clarity over knobs

  1. CORE OBJECTS AND DATA MODEL

Project:

  • id
  • title
  • aspect_ratio
  • visual_style_preset
  • environment_id
  • character_ids
  • ordered list of beats
  • execution_profile_defaults

Environment:

  • id
  • name
  • structured_description
  • reference_images
  • locked flag

Character:

  • id
  • name
  • role
  • structured_description
  • reference_images
  • locked flag

Beat:

  • id
  • order_index
  • intent
  • environment_ref
  • character_refs
  • state: Draft, Previewed, Validated
  • assets: preview, final
  • derived: semantic_caption, motion_plan, guidance_events

GuidanceEvent:

  • id
  • beat_id
  • type
  • cause
  • suggestions
  • actions

  1. STATE MACHINE

States:

  1. No Project
  2. Project Setup
  3. Beat Authoring
  4. Preview Iteration
  5. Beat Validation
  6. Assembly Preview
  7. Final Render
  8. Export Complete

Rules:

  • actions are gated by state
  • invalid actions trigger guidance
  • validated beats must be revalidated after structural changes

  1. EXECUTION PROFILES

Users choose outcomes, not models.

Fast Preview:

  • purpose: iteration
  • latency target: 2 to 3 seconds
  • watermark: forbidden
  • confirmation: no

High Quality Images:

  • purpose: final stills
  • confirmation: yes

Image to Video Per Beat:

  • requires beat validation
  • confirmation: yes

Final Assembly Render:

  • requires all beats validated
  • confirmation: yes

  1. PERFORMANCE AND LATENCY BUDGETS

  • Beat preview generation: ≤ 3 seconds
  • Beat regeneration: ≤ 3 seconds
  • Assembly preview start: ≤ 2 seconds
  • Voice recognition latency: ≤ 500 ms perceived
  • UI updates: immediate

Any feature that violates these budgets must be redesigned or cut.


  1. GUIDANCE ENGINE AND FAILURE TAXONOMY

Failure categories:

  • IntentMismatch
  • CharacterDrift
  • EnvironmentDrift
  • MotionOverpower
  • CompositionIssue
  • ExecutionFailure

Rules:

  • no generic errors
  • no “try again” messaging
  • every failure includes cause and fixes
  • fixes are one click or one voice command

  1. VOICE INTERFACE

Voice is the primary control plane.

Voice supports:

  • add beat
  • describe beat intent
  • generate preview
  • regenerate beat
  • validate beat
  • assemble preview
  • confirm final render

Voice is state aware and constrained to valid next actions.


  1. SCREEN LEVEL UX OVERVIEW

Core screens:

  • Home and Projects
  • Project Setup
  • Storyboard Workspace
  • Beat Focus Mode
  • Assembly Preview
  • Export

UI principles:

  • narrow
  • guided
  • training wheels by default
  • easier than Canva
  • less free form than Figma

No blank canvas paralysis.


  1. ASSEMBLY ENGINE

  • beat order defines sequence
  • default durations and transitions
  • no timeline exposure
  • stitching handled internally
  • users can preview full video without export

  1. OBSERVABILITY AND SUCCESS SIGNALS

Success is measured by learning velocity.

Key signals:

  • average beat regenerations per project
  • time to first assembly preview
  • percentage reaching final render
  • guidance acceptance rate
  • voice usage ratio
  • drop off by state

High regeneration with low abandonment is healthy.


  1. MUSIC AND SOUND, FUTURE SCOPE

Music is treated as a beat aware layer.

Principles:

  • sound follows story
  • music aligns to beats, not timestamps
  • silence is a valid beat outcome
  • no second timeline

Music is a future extension, not a missing feature.


  1. COMPETITIVE FRAME, INTERNAL

Others optimize for generation. StoryBeats optimizes for iteration.

Others expose complexity. StoryBeats absorbs it.

Others assume expertise. StoryBeats teaches by doing.


  1. FINAL NORTH STAR

A tired non expert can speak an idea, iterate cheaply, understand failures, and assemble a real video without fear, friction, or guesswork.

If this is true, nothing else matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment