Skip to content

Instantly share code, notes, and snippets.

@esatterwhite
Created May 18, 2026 15:50
Show Gist options
  • Select an option

  • Save esatterwhite/9241b67e49dbe3eaea003c4569a8997e to your computer and use it in GitHub Desktop.

Select an option

Save esatterwhite/9241b67e49dbe3eaea003c4569a8997e to your computer and use it in GitHub Desktop.
Commit Abstract Syntax Tree

CAST: Conventional Commit Abstract Syntax Tree

Conventional Commit Abstract Syntax Tree.


cast is a Working Draft for representing conventional commit messages in a syntax tree. It implements unist. It can represent conventional commit messages as defined by the Conventional Commits specification.

This document defines a format for representing conventional commit messages as an abstract syntax tree.

Contents

Introduction

This document defines a format for representing conventional commit messages as an abstract syntax tree. Development of cast started in January 2025, as part of a conventional commit parser project that needed to provide both structured JSON output and AST-based transformations.

This specification is written in a Web IDL-like grammar.

Where this specification fits

cast extends unist, a format for syntax trees, to benefit from its ecosystem of utilities.

cast relates to JavaScript in that it has utilities for working with compliant syntax trees in JavaScript. However, cast is not limited to JavaScript and can be used in other programming languages.

cast relates to the unified ecosystem in that cast syntax trees can be used with unified processors for transformation, validation, and serialization tasks.

cast relates to conventional commits in that it provides a structured representation of commit messages that follow the conventional commit format, enabling programmatic analysis and transformation.

Nodes (abstract)

Literal

interface Literal <: UnistLiteral {
  value: string
}

Literal (UnistLiteral) represents an abstract interface in cast containing a value. Its value field is a string.

Parent

interface Parent <: UnistParent {
  children: [CAstContent]
}

Parent (UnistParent) represents an abstract interface in cast containing other nodes (said to be children). Its content is limited to only other cast content.

Nodes

Root

interface Root <: Parent {
  type: 'root'
  children: [Content]
}

Root (Parent) represents a conventional commit message document. Root can be used as the root of a tree, never as a child. Its content model is content.

For example, the following commit message:

feat(api): add user authentication

This commit adds JWT-based authentication to the API.
It includes login, logout, and token refresh endpoints.

BREAKING CHANGE: authentication is now required for all API endpoints
Resolves: #123

Yields:

{
  type: 'root',
  children: [
    {
      type: 'header',
      children: [/* header content */]
    },
    {
      type: 'body',
      children: [/* body content */]
    },
    {
      type: 'footer',
      children: [/* footer content */]
    }
  ]
}

Header

interface Header <: Parent {
  type: 'header'
  children: [HeaderContent]
}

Header (Parent) represents the first line of a conventional commit message. Header can be used where content is expected. Its content model is header content.

The header contains the commit type, optional scope, optional breaking change indicator, and description.

Type

interface Type <: Literal {
  type: 'type'
  value: string
}

Type (Literal) represents the type of change being committed. Type can be used where header content is expected. Its content is represented by its value field.

Common conventional commit types include: feat, fix, docs, style, refactor, test, chore, etc.

For example:

{type: 'type', value: 'feat'}

Scope

interface Scope <: Literal {
  type: 'scope'
  value: string
}

Scope (Literal) represents the scope of the change being committed. Scope can be used where header content is expected. Its content is represented by its value field.

The scope is optional and appears in parentheses after the type.

For example:

{type: 'scope', value: 'api'}

Bang

interface Bang <: Parent {
  type: 'bang'
  children: [Text]
}

Bang (Parent) represents a breaking change indicator in the header. Bang can be used where header content is expected. Its content model consists of a single Text node containing '!'.

The breaking change indicator appears as an exclamation mark (!) after the type or scope. The Bang node is a syntactic element used for round-trip conversion. The semantic meaning of a breaking change is captured by the breaking field on Description and Trailer nodes.

For example:

{
  type: 'bang',
  children: [
    {type: 'text', value: '!'}
  ]
}

Description

interface Description <: Parent {
  type: 'description'
  children: [TextContent]
  breaking: boolean
  value: string
}

Description (Parent) represents the description of the change. Description can be used where header content is expected. Its content model is text content.

The breaking field is true when the commit header contains a breaking change indicator (!), and false otherwise.

The value field contains the complete description text extracted from all child text nodes, providing convenient access to the description without traversing the children array.

For example:

{
  type: 'description',
  breaking: false,
  value: 'add user authentication',
  children: [
    {type: 'text', value: 'add user authentication'}
  ]
}

Example with breaking change:

{
  type: 'description',
  breaking: true,
  value: 'add user authentication',
  children: [
    {type: 'text', value: 'add user authentication'}
  ]
}

Body

interface Body <: Parent {
  type: 'body'
  children: [BodyContent]
}

Body (Parent) represents the body of the commit message. Body can be used where content is expected. Its content model is body content.

The body provides additional context about the change and is separated from the header by a blank line.

For example:

{
  type: 'body',
  children: [
    {
      type: 'line',
      children: [
        {type: 'text', value: 'This commit adds JWT-based authentication to the API.'}
      ]
    },
    {
      type: 'line',
      children: [
        {type: 'text', value: 'It includes login, logout, and token refresh endpoints.'}
      ]
    }
  ]
}

Footer

interface Footer <: Parent {
  type: 'footer'
  children: [FooterContent]
}

Footer (Parent) represents the footer section of the commit message. Footer can be used where content is expected. Its content model is footer content.

The footer contains git trailers and is separated from the body by a blank line.

Trailer

interface Trailer <: Parent {
  type: 'trailer'
  children: [TrailerContent]
  breaking: boolean
}

Trailer (Parent) represents a single git trailer in the footer. Trailer can be used where footer content is expected. Its content model is trailer content.

A trailer consists of a token and a value separated by a colon.

The breaking field is true when the trailer represents a breaking change (e.g., BREAKING CHANGE: or BREAKING-CHANGE:), and false otherwise.

For example:

{
  type: 'trailer',
  breaking: false,
  children: [
    {type: 'trailerkey', children: [{type: 'text', value: 'Resolves'}]},
    {type: 'trailervalue', children: [{type: 'text', value: '#123'}]}
  ]
}

Breaking change trailer:

{
  type: 'trailer',
  breaking: true,
  children: [
    {type: 'trailerkey', children: [{type: 'text', value: 'BREAKING CHANGE'}]},
    {type: 'trailervalue', children: [{type: 'text', value: 'API has changed'}]}
  ]
}

TrailerKey

interface TrailerKey <: Parent {
  type: 'trailerkey'
  children: [TextContent]
}

TrailerKey (Parent) represents the key part of a git trailer. TrailerKey can be used where trailer content is expected. Its content model is text content.

Common trailer keys include: BREAKING CHANGE, Resolves, Fixes, Reviewed-by, Co-authored-by, etc.

For example:

{
  type: 'trailerkey',
  children: [
    {type: 'text', value: 'Resolves'}
  ]
}

TrailerValue

interface TrailerValue <: Parent {
  type: 'trailervalue'
  children: [TextContent]
}

TrailerValue (Parent) represents the value part of a git trailer. TrailerValue can be used where trailer content is expected. Its content model is text content.

Line

interface Line <: Parent {
  type: 'line'
  children: [TextContent]
}

Line (Parent) represents a single line of text in the body or trailer value. Line can be used where body content is expected. Its content model is text content.

Lines are separated by newline characters and can contain both plain text and issue references, allowing for precise tracking of inline elements.

For example:

{
  type: 'line',
  children: [
    {type: 'text', value: 'This fixes issue '},
    {type: 'issueReference', value: '#123', prefix: '#', id: 123},
    {type: 'text', value: ' in the parser'}
  ]
}

Text

interface Text <: Literal {
  type: 'text'
  value: string
}

Text (Literal) represents textual content. Text can be used where text content is expected. Its content is represented by its value field.

For example:

{type: 'text', value: 'add user authentication'}

IssueReference

interface IssueReference <: Literal {
  type: 'issueReference'
  value: string
  prefix: string
  id: number
}

IssueReference (Literal) represents a reference to an issue or pull request. IssueReference can be used where text content is expected. Its content is represented by its value field, with additional prefix and id fields for structured access.

For example:

{
  type: 'issueReference',
  value: '#123',
  prefix: '#',
  id: 123
}

Mixin

PositionalInfo

interface mixin PositionalInfo {
  position: Position?
}

PositionalInfo represents positional information of a node in the source commit message. This mixin can be applied to any node to preserve source location information for error reporting and source mapping.

All CAST nodes should include position information when parsed from source text to enable:

  • Precise error reporting with line and column numbers
  • Source mapping for transformations
  • IDE integrations with hover information and diagnostics
  • Linting tools with exact error locations

Position

interface Position {
  start: Point
  end: Point
}

Position represents the location of a node in a source commit message. The start field represents the place of the first character of the node. The end field represents the place of the first character after the node.

Point

interface Point {
  line: number >= 1
  column: number >= 1
  offset: number >= 0
}

Point represents one place in a source commit message. The line field (1-indexed integer) represents a line in the source. The column field (1-indexed integer) represents a column in the source. The offset field (0-indexed integer) represents a character in the source.

Position Mapping Guidelines

When converting from CST to CAST, position information should be preserved as follows:

  1. Token-based nodes (Type, Scope): Use exact token positions
  2. Composite nodes (Header, Body, Footer, TrailerKey, TrailerValue): Span from first to last child
  3. Text nodes: Preserve exact character ranges including whitespace
  4. Issue references: Use substring positions within trailer values

Position information is optional but strongly recommended for nodes parsed from source text.

Content model

type CAstContent = Content

Each node in cast falls into one or more categories of Content that group nodes with similar characteristics together.

Content

type Content = Header | Body | Footer

Content represents the top-level sections of a conventional commit message.

HeaderContent

type HeaderContent = Type | Scope | Bang | Description

Header content represents the components that can appear in the commit header.

BodyContent

type BodyContent = Line

Body content represents the content that can appear in the commit body. Body content consists of line nodes, which can contain text and issue references.

FooterContent

type FooterContent = Trailer

Footer content represents the content that can appear in the commit footer.

TrailerContent

type TrailerContent = TrailerKey | TrailerValue

Trailer content represents the components of a git trailer.

TextContent

type TextContent = Text | IssueReference

Text content represents textual content that may contain issue references.

Conventional Commit Mapping

This section maps elements of the Conventional Commits specification to cast nodes:

Conventional Commit Element CAST Node Description
<type> Type The type of change (feat, fix, etc.)
(<scope>) Scope Optional scope in parentheses
! Bang Breaking change indicator
<description> Description Short description of the change
Body paragraph Body Extended description
<token>: <value> Trailer Git trailer (footer)
BREAKING CHANGE: Trailer (special) Breaking change description
#123, GH-456 IssueReference Issue/PR references

Round-trip Conversion

The cast specification is designed to support lossless round-trip conversion:

  1. Parse: Commit message text → CAST
  2. Transform: Modify the CAST (validate, lint, reformat)
  3. Serialize: CAST → Commit message text

Key design principles for round-trip compatibility:

  • Preserve whitespace: Significant whitespace is preserved in Text nodes
  • Maintain structure: All structural elements are represented as nodes
  • Position tracking: Optional positional information preserves source locations
  • No information loss: All parts of the original commit message are represented

Examples

Simple feature commit

Input:

feat: add user authentication

AST:

{
  type: 'root',
  breaking: false,
  children: [
    {
      type: 'header',
      children: [
        {type: 'type', value: 'feat'},
        {type: 'description', children: [
          {type: 'text', value: ' add user authentication'}
        ]}
      ]
    }
  ]
}

Complex commit with breaking change

Input:

feat(api)!: add user authentication

This commit adds JWT-based authentication to the API.
It includes login, logout, and token refresh endpoints.

BREAKING CHANGE: authentication is now required for all API endpoints
Resolves: #123
Co-authored-by: Jane Doe <jane@example.com>

AST:

{
  type: 'root',
  breaking: true,
  children: [
    {
      type: 'header',
      children: [
        {type: 'type', value: 'feat'},
        {type: 'scope', value: 'api'},
        {type: 'bang', value: '!'},
        {type: 'description', breaking: true, value: 'add user authentication', children: [
          {type: 'text', value: ' add user authentication'}
        ]}
      ]
    },
    {
      type: 'body',
      children: [
        {type: 'text', value: 'This commit adds JWT-based authentication to the API.\nIt includes login, logout, and token refresh endpoints.'}
      ]
    },
    {
      type: 'footer',
      children: [
        {
          type: 'trailer',
          breaking: true,
          children: [
            {type: 'trailerkey', children: [{type: 'text', value: 'BREAKING CHANGE'}]},
            {type: 'trailervalue', children: [
              {type: 'text', value: ' authentication is now required for all API endpoints'}
            ]}
          ]
        },
        {
          type: 'trailer',
          breaking: false,
          children: [
            {type: 'trailerkey', children: [{type: 'text', value: 'Resolves'}]},
            {type: 'trailervalue', children: [
              {type: 'issueReference', value: '#123', prefix: '#', id: 123}
            ]}
          ]
        },
        {
          type: 'trailer',
          breaking: false,
          children: [
            {type: 'trailerkey', children: [{type: 'text', value: 'Co-authored-by'}]},
            {type: 'trailervalue', children: [
              {type: 'text', value: ' Jane Doe <jane@example.com>'}
            ]}
          ]
        }
      ]
    }
  ]
}

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment