Skip to content

Instantly share code, notes, and snippets.

@44100hertz
Created February 11, 2024 19:24
Show Gist options
  • Save 44100hertz/264c1f43149cd0ea54bdf92f8dc11c87 to your computer and use it in GitHub Desktop.
Save 44100hertz/264c1f43149cd0ea54bdf92f8dc11c87 to your computer and use it in GitHub Desktop.
Modifying Parsers to work with structure editors

General

Conventional parsers must be modified to create an effective structure editor.

Most parsers create an AST (abstract syntax tree), but structure parsers create a CST (concrete syntax tree). This is because the parser must take note of things that affect the code appearance and not just the code output.

The parser should provide error feedback, but should NEVER give up when creating structure. Instead, it should use sane fallbacks as to always create an editable CST, within reason. Every CST should correspond almost 1:1 with corresponding code, whether that code is valid or invalid.

The parser will never evaluate the code in question. It can possibly be integrated with "smart" features in an IDE-like way, but this requires translation from CST into AST or text.

Ideally, the parser should be highly flexible in order to work with a variety of languages, and it should be easily user-extensible.

Special CST Nodes/Properties

The CST has nodes and properties of nodes that wouldn't reasonably exist in an AST, for the reasons above. These include, but are not limited to:

  • Comments: All comments are editable text nodes.
  • Parens: Even when parens have no functional purpose, they should be preserved to some extent for clarity.
  • Whitespace: Newlines and line breaks should be taken note of. Every expression/subexpression is either written vertically or horizontally.
  • Placeholder: A node for anything that should be there, but isn't.
  • Implied operator: When two values are next to each other, a placeholder "implied operator" sits between them.
  • Unmatched: Unmatched parens should find the nearest valid implied paren, but never fail unless it's totally impossible. The CST must take note of the fact that it is unmatched.
  • Unknown: Any region of code that can't be reasonably solved is "Unknown" and can be edited as text.
  • Extension nodes: Common patterns exist within code that can be defined in a special structure-only node. These would likely be created using plugins, and could appear in many ways in the structure editing UI, marked either by special comments or by a specific structure match.

Note on speed

Since the structure editor is not designed for live textual editing (as is an LSP), the parsing of text to structure is allowed to be slow, since it will only be performed on file read.

Unparser

The "unparser" or serializer is the means of displaying the code being edited. This can range from a translation back into source text (i.e. Prettier), all the way to any structure editing UI imaginable. Mostly, the parser should be designed so that the "Unparsed" code is just a prettified version of the source text. The parser and unparser should be paired with tests so that repeated run-throughs don't destroy any significant part of the source text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment