Conventional parsers must be modified to create an effective structure editor.
Most parsers create an AST (abstract syntax tree), but structure parsers create a CST (concrete syntax tree). This is because the parser must take note of things that affect the code appearance and not just the code output.
The parser should provide error feedback, but should NEVER give up when creating structure. Instead, it should use sane fallbacks as to always create an editable CST, within reason. Every CST should correspond almost 1:1 with corresponding code, whether that code is valid or invalid.
The parser will never evaluate the code in question. It can possibly be integrated with "smart" features in an IDE-like way, but this requires translation from CST into AST or text.
Ideally, the parser should be highly flexible in order to work with a variety of languages, and it should be easily user-extensible.
The CST has nodes and properties of nodes that wouldn't reasonably exist in an AST, for the reasons above. These include, but are not limited to:
- Comments: All comments are editable text nodes.
- Parens: Even when parens have no functional purpose, they should be preserved to some extent for clarity.
- Whitespace: Newlines and line breaks should be taken note of. Every expression/subexpression is either written vertically or horizontally.
- Placeholder: A node for anything that should be there, but isn't.
- Implied operator: When two values are next to each other, a placeholder "implied operator" sits between them.
- Unmatched: Unmatched parens should find the nearest valid implied paren, but never fail unless it's totally impossible. The CST must take note of the fact that it is unmatched.
- Unknown: Any region of code that can't be reasonably solved is "Unknown" and can be edited as text.
- Extension nodes: Common patterns exist within code that can be defined in a special structure-only node. These would likely be created using plugins, and could appear in many ways in the structure editing UI, marked either by special comments or by a specific structure match.
Since the structure editor is not designed for live textual editing (as is an LSP), the parsing of text to structure is allowed to be slow, since it will only be performed on file read.
The "unparser" or serializer is the means of displaying the code being edited. This can range from a translation back into source text (i.e. Prettier), all the way to any structure editing UI imaginable. Mostly, the parser should be designed so that the "Unparsed" code is just a prettified version of the source text. The parser and unparser should be paired with tests so that repeated run-throughs don't destroy any significant part of the source text.