Mini-challenge: parsing indented input into an AST

Yesterday I wanted to parse something like this:

alpha
  beta
  gamma
delta
  epsilon
    zeta

And end up with an AST nested like this: (alpha(beta, gamma), delta(epsilon(zeta))). The actual type of the AST could be just a list of lists of strings, or actual Node objects; it's all isomorphic. That is, the AST contains exactly the same hierarchy as the text. Assume for simplicity that we want to hardcode indentation as being two spaces.

I came up with two possible ways to solve this.

Recursively parse indentation. That is, enter a subrule whenever parsing things that are more indented. Maybe there will be a little backtracking of whitespace whenever subrules fail. This feels like the "right" approach.
Build AST after parsing all the lines. That is, register all the indentations, and then based on those create the whole AST in one go. Will make the parsing way easier, but feels like throwing away information only to rebuild it later.

I went with the first approach, but didn't get far because I wasn't fit to program yesterday, apparently. Going to try again today, and thought I might as well make it into a mini-challenge to let others have a go, too.

It's probably a good idea to write tests for this. 😁

use v6; sub parse ($src) { my $ast = []; my @stack = $ast; for $src.lines -> $line is copy { next if $line ~~ /^\s*$/; my $level = 1; while $line ~~ /^' '/ { $line .= subst(/^' '/, ''); ++$level; } while $level < @stack { @stack.pop; } die 'incorrect nesting' unless $level == @stack; given [] { @stack[*-1].push($line => $_); @stack.push($_); } } return $ast; } say parse($_).perl given ' alpha beta gamma delta epsilon zeta ';

grammar Ident { token TOP {^ [<line> \n]* <line> \n? $} token line { <ident> <word> } token word { \w+ } token ident { \s* } } class Ident::Actions { method TOP($/) { my $ast = ().list; for $<line>».ast -> $entry is copy { my $list = $ast; $list = $list[$list.end] while ($$entry<level>-- > 0 ); @$list.push( [ $entry<word>.Str ] ); } make $ast; } #Note: Assuming 2 spaces for an indent level method line($/) { make {word=>$<word>, level=>$<ident>.chars / 2 } } } my $req = 'alpha beta gamma delta epsilon zeta'; say Ident.parse($req, :actions(Ident::Actions.new)).ast.perl();

masak/description.md

gerdr commented Sep 13, 2012

Uh oh!

dginev commented Sep 13, 2012

Uh oh!