Skip to content

Instantly share code, notes, and snippets.

@kfsone
Last active March 16, 2021 18:50
Show Gist options
  • Save kfsone/9a3c9737fc571bb603252816c330f0b3 to your computer and use it in GitHub Desktop.
Save kfsone/9a3c9737fc571bb603252816c330f0b3 to your computer and use it in GitHub Desktop.
structured rule example

Structured Rules

These are mostly syntactic sugar for deeper code generation and integration with the user's AST.

They are also tied to a top-down concept that helps guide parsing by supporting context.

E.g: How do you validate the type of a function parameter before the end of the body?

StructuredRules grammar is a superset of BNF.

StructuredRules cater to some of the most common paths and aim to make them easier to execute.

Note:

For the purposes of this document:

  • Specification language: BNF-based;
  • Actions: Yacc/Bison style ({ ... });
  • Language: Golang
  • Variables: Yacc/Bison style: $$ is the current context/return value, $0 is the first captured attribute, ...

Named matches

This is a core concept that is re-used in several contexts that ultimately make it intuitive.

ProductionName( [ non-terminal or production ] )

This has several purposes that will be expanded above, but is one of the fundamentals of StructuredRules.

Named rules elide non-terminals

We use them in production rules all the time, we almost never want to see them, so we end up counting them.

A structured rule expression discards unnamed matches automatically. So that BNF rules continue to behave as they would without StructuredRules, this requires the production uses the parentheses syntax, with or without an anchored match:

IfStmt() : "if" IfExpr Statement "else" Statement ";" ;

Keyword/Anchored rules

Common Idiom:

StructName
    : "struct" identifier { $$ = $1 };

Adaptation:

StructName("struct") : identifier ;   // no action required

Optional Productions

Common Idiom:

Inheritance
    : ":" identifier { $$ = $1 }
    | empty
    ;
Structure
    : StructName [ Inheritance ] StructBody
    ;

Adaptation

? suffix.

You can make any rule optional on a case-by-case basis by suffixing it with a question-mark:

Structure : StructName Inheritance? StructBody  { $$ = $1  /* nil if inheritance wasn't used */ }

Or you can make a rule always-optional by suffixing the declaration with a question-mark:

Inheritance?(":") : identifier;
Structure : StructName Inheritance  // invalid: must use 'Inheritance?'
Structure : StructName Inheritance? StructBody;

Note The question-mark should be syntactic and not part of the name so that error messages do not introduce the '?' for end-users, but when processing the grammar "Inheritance" should be rejected with a note that "Inheritance is optional, use Inheritance? instead".

Open/Close functions

Common Idiom

StructDecl
    : "struct" { $$ = new_struct(); }
        StructName ...
               { end_struct($$); }
    ;

gocc note: gocc doesn't appear to provide for this

Adaptation

Since we're introducing programmer-familiar concepts, lets use the brace-enclosure concept.

StructDecl("struct") {
    ...
}

This effectively behaves as though there were two rules injecting actions at the start and end of the rule.

StructDecl
    : "struct" { $$ = push_context(StructDecl_Open($$)) }  /* pass current context to function, return becomes context for this scope */
      ...
      { descoped = pop_context(); return descoped }
    ;

Approximate BNF

StructuredRuleDecl
    :   StructuredRuleDecl StructuredRuleModifiers "{" StructuredRuleBody "}" [ ";" ];

StructuredRule
    :   StructuredRuleIdentity "(" StructuredRuleAnchor ")" /* T0 = rule's name, T2 = rule's keyword */
    |   StructuredRuleIdentity "(" ")"
    ;
    
StructuredRuleName
    :   ProductionName "?"  { $$ = &StructuredRule{ Name: $0, Optional: true } }
    |   ProductionName      { $$ = &StructuredRule{ Name: $0, Optional: false } }
    ;
    
StructuredRuleAnchor
    :   Keyword
    |   Between
    |   empty
    ;

Keyword : StringLiteral ;

Between
    :   StringLiteral "..." StringLiteral   // 0+ case
    |   StringLiteral "(" Repetitions ")" StringLiteral
    ;

Repetitions
    :   integer "," integer
    |   integer "," "N"
    |   integer
    ;

StructuredRuleBody
    :   Expressions
    ;

Expressions
    :   Expressions Expression
    |   Expression
    ;

Expression:
    // discard non-terminals
    :   NonTerminal
    // Call a corresponding function on the context with the value of rule's match
    |   "." StructuredRule
    // Capture everything else
    |   ProductionName
    |   StructuredRule
    ;

Sample Grammar

// Regular EBNF here
Grammar : Definitions;
Definitions : Struct | Function;

Struct("struct") {
    .StructName(identifier) [ ":" .ParentName(identifier) ] StructBody
/*
    Because this is a brace-enclosed rule, we call Struct_Open and Struct_Close before and after the rule.

    . StructName( identifier )
    // Expands to: StructName: identifier { $context.StructName($0) } }

    [ ":" . ParentName( identifier )]
    // Expands to: ParentName? : ":" identifier { $context.ParentName($1) } }

    // Ordinary rule
    StructBody  { $context.StructBody($$) }
*/
}

// Declares a rule called "Function" anchored to the keyword 'fn'
Function("fn") {
    . FunctionName( identifier )
    . ParameterList
    . FunctionBody
}

StructBody("{" (1, N) "}") {
    /* But it also produces code equivalent to:
    newContext := context.StructBody_Open()
    err := ...match struct body...(newContext)
    context.StructBody_Close(newContext, err)
    */

    // The '$n: <name>' syntax adds a type-switch for you rather than you using
    // a naked cast or having to implement your own switch statements.
    // Also, note that in theory StructMember_Close could do this for you,
    . StructMember { context.AddMember($$: ast.StructMember) }
}

StructMember() {
    . MemberName( identifier )
    . TypeName
    . ArrayDim ?  // allow space between rule and ?
    fieldTerminator  // captured and named
}

fieldTerminator : ";" | "\n";

ParameterList("(" (0, N) ")") {
    ParameterName( identifier ) TypeName { context.AddParameter($0, $1); }
}

TypeName
    : "int"
    | "bool"
    | "string"
    ;

FunctionBody("{" (0, N) "}") {

}

Questions

  • Allow space between ProductionName and ??
    • In a longer list, "...?" is less visible than "[ ... ]", "... ?" might be easier to sight;
  • Missing an explanation of why you still need actions, and what happens to matches that you passed to functions:
// StructBody() here is silently "lost" because there's no action and you didn't pass it to a method
Struct() { .StructDecl() .Inheritance?() StructBody() }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment