I'm thinking I might just write a typical ebnf generator, but then I'd be going from reinventing the wheel to reinventing the wheel, tire and hubcap.
EBNF generators I've used are very hands-offy because providing for the most common paths might somehow exclude some use cases.
So I think I want something that is EBNF+, and which is focused on describing c/json/lua/python type DSLs in a top-down way that allows accumulation of guidance through context, allows the user to validate in-situ rather than forcing them to write the code to build the parse tree, code to walk it and validate it and then code to do anything with it.
Start from EBNF and extend sideways, if you will.
For cases where users are willing to make their language simple and contextual, heavily top-down, provide additional tooling.
Stop for a moment and consider the simple variable declaration construct. Typically written:
Decl := DeclKeyword DeclName DeclOptions;
DeclKeyword := "var";
DeclName := identifier;
...
It's likely the sole reason the user introduced the var
keyword is to help the parser. In return, the user frequently has to "avoid" the keyword:
VarDecl : "var" identifier { $$ = $1; };
I think we can help back by making it possible to elide literals. If you want to capture the literal, give it a name. After all, doing so will make your life easier.
"Yeah, but that means writing another rule".
Allow that to be inlined, then.
VarDecl : "var" VariableName( identifier ); /* equivalent to { $$ = $0; } */
CaptureDecl : Keyword("var") VariableName( identifier ) { $$ = $1; }; // because you captured the literal }}
Better still, lets hoist the keyword entirely:
VarDecl("var") : // hi parser, this rule only takes place if we match the literal "var"
This way you don't have secret-sauce messing with an otherwise simple rule:
VarDecl: "var" identifier { token 1 becomes $0 }
vs
VarDecl("var") : identifier ; // just makes sense
I routinely find that my AST wants for intro and outro function calls to let me build and remove context. To give the programmer this sense of scope in their language definition, I think using a scoping syntax makes most sense:
(note: this is in addition to supporting BNF)
VarDecl("var") { // invokes 'VarDecl_Open' on the object the top of the context stack, and pushes its return value onto that stack.
} // will invoke your ast's VarDecl_Close function and pop the top.
VarDecl("var") {
// "dot" prefix denotes a match that calls an ast function of the same name
.VariableName( identifier ) // matches an identifier and then calls context.VariableName with it
.VariableType( identifier )
VariableDefault << $context.SetVariableDefault($0) >> // do it yourself version
}
Production : Thing1 \[ Thing2 \] ...
Here you're being explicit. Yay. But what if it's always optional, and you'd rather put it in the production rule? Take inheritance in the ": " form:
Inheritance(":")
- Rubyesque: Question-mark suffix, so that it remaks explicitly stated:
Struct("struct") : Name Inerhitance? Body;
Inheritance?(":")