Skip to content

Instantly share code, notes, and snippets.

@dginev
Created July 13, 2018 03:06
Show Gist options
  • Save dginev/e2f718c557af3350a760da81dce48aa9 to your computer and use it in GitHub Desktop.
Save dginev/e2f718c557af3350a760da81dce48aa9 to your computer and use it in GitHub Desktop.
Demo: Ambiguous math lexemes with Marpa

GRAMMAR INPUT:

SUMOP|FUNCTION:Σ:0 _:: UNKNOWN:x:1 _:: MULOP:*:2 _:: UNKNOWN:y:3 

Demo Parse

<ltx:XMApp>
  <ltx:XMTok meaning="set" omcd="cdlf"/>
  <ltx:XMApp category="factor">
    <ltx:XMTok xml:id="2">*</ltx:XMTok>
    <ltx:XMApp category="factor">
      <ltx:XMTok xml:id="0" role="FUNCTION">Σ</ltx:XMTok>
      <ltx:XMTok xml:id="1" role="term">x</ltx:XMTok>
    </ltx:XMApp>
    <ltx:XMTok xml:id="3" role="term">y</ltx:XMTok>
  </ltx:XMApp>
  <ltx:XMApp category="term">
    <ltx:XMTok xml:id="0" role="BIGOP">Σ</ltx:XMTok>
    <ltx:XMApp category="factor">
      <ltx:XMTok xml:id="2">*</ltx:XMTok>
      <ltx:XMTok xml:id="1" role="term">x</ltx:XMTok>
      <ltx:XMTok xml:id="3" role="term">y</ltx:XMTok>
    </ltx:XMApp>
  </ltx:XMApp>
</ltx:XMApp>

Produced by code at: https://github.com/dginev/LaTeXML-Plugin-MathSyntax/tree/ambiguous-lexemes

@dginev
Copy link
Author

dginev commented Jul 13, 2018

Marpa's mechanism allows for this to be done very directly and elegantly, namely via:

    if ($category =~ /\|/) { # ambiguous lexeme! 
      $category=~s/\s+//; # neutralize spaces
      for my $symbol(split(/\|/,$category)) {
        $rec->lexeme_alternative($symbol, $value);
      }
      $rec_events = $rec->lexeme_complete($pos, $length);
    } else { #simple lexeme
      $rec_events = $rec->lexeme_read($category,$pos,$length,$value);
    }

where a latexml lexeme such as SUMOP|FUNCTION:Σ:0 is first mapped onto the variables as:

$category = 'SUMOP|FUNCTION';
$value='Σ:0';

etc, length and pos are counter that are bookkept through the per-lexeme parse loop.

The key idea is that you can read in an arbitrary number of "lexeme alternatives" at the same input position for the Marpa grammar, and hence provide an ambiguous input from the very onset. I believe this is a benefit of the Early process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment