This Gist demonstrates the use of inheritance with PLY parsers. Here, BaseLexer
and BaseParser
are defined in the module base.py. ALexer
and AParser
, in a.py, and BLexer
and BParser
, in b.py, inherit from BaseLexer
and BaseParser
, extending the list of tokens and the parser rules. As a toy example, these parsers evaluate a limited subset of arithmetic expressions. base.py
contains rules for numbers and parentheses, a.py
adds rules for addition, and b.py
adds rules for subtraction.
There are several key ingredients needed to make this work well. First, as described in the PLY documentation, each lexer needs to be in a different Python module, i.e. in a different .py file, to avoid confusing PLY's error checking. Next, the lexers and parsers are all defined from class instances. As described in the documentation, this is accomplished by calling ply.lex.lex()
and ply.yacc.yacc()
with the module
keyword argument set to the class instance. The BaseLexer
class additionally implements __iter__()
, token()
, and input()
methods, so that its instances can be used directly in the parser. Furthermore, a different tabmodule
argument is passed to ply.yacc.yacc()
in each file, so that the different parsers don't clobber each other's cached tables.
With all this scaffolding in place, we create subclasses of base.BaseLexer
and base.BaseParser
. In ALexer
and BLexer
, we set the tokens
class-level variable by adding base.BaseLexer.tokens
and a list of new tokens. When PLY sets up the lexers and parsers for the subclasses, it does so using dir()
, so it sees all tokens and parser rules from both the subclass and the parent class. From here on out, everything should work as expected.
In this example, ply.yacc.yacc()
is also called with start
set, to specify which grammar symbol is returned at the top level. Additionally, t_newline()
, t_error()
, and p_error()
are set up to track line numbers and provide verbose error messages, including locations in the file.