Blog 2019/9/7
<- previous | index | next ->
Let's learn how to implement a Lisp or Scheme-like language by writing a series of increasingly capable interpreters.
- previous: Step 2: Recursive descent parsing
The core of any Lisp-like interpreter is the eval function.
As a concept, eval is over 60 years old,
first seen in McCarthy's
original Lisp implementation.
Paul Graham
wrote a great article[ps][pdf][markdown]
about the significance of eval.
eval takes an AST and an environment and returns a value.
However, Python already has an eval function, so we will use the name eval2. Start with a stub:
eval.py
def eval2(ast, env):
return NoneLisp-like languages follow the convention that a list represents a function call, where the first item in the list is a function (the operator) and the rest of the list items are the arguments to that function. E.g. (sum 1 2 3) evaluates to 6.
More particularly, sum is a symbol. Symbols evaluate to their corresponding value from the environment. The environment is like a lookup table which maps symbols to values. In this case, the value of sum is a function pointer.
1, 2 and 3 are literals, which evaluate to themselves.
In a traditional Lisp-like language implementation, eval determines the value of the operand and arguments, then hands them to a function named apply, which executes the function.
(again, Python already has an apply, so we will use the name apply2.)
But we are getting ahead of ourselves. First, let's support evaluating literals:
def is_int(ast):
return ast[0] == "INT"
def eval2(ast, env):
if is_int(ast):
return int(ast[1])
else:
raise Exception("Don't know how to evaluate %s" % ast)Try it out:
$ python
Python 2.7.16 (default, Mar 4 2019, 09:02:22)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from eval import *
>>> env = {}
>>> ast = ("INT", "42")
>>> eval2(ast, env)
42
Cool, let's skip to the punchline and add a __main__ implementation so that we can run this bad boy:
if __name__ == "__main__":
import sys
from lex import load_tokendefs, tokenize
from parse import parse
if len(sys.argv) > 1:
text = open(sys.argv[1]).read()
else:
text = sys.stdin.read()
tdefs = load_tokendefs("tokendefs.txt")
env = {}
print eval2(parse(tokenize(tdefs, text)), env)$ echo -n 1 | python eval.py
1
Woot!
Next, let's create a function sum2 (Python already defines a sum function):
import operator
def sum2(*args):
return reduce(operator.add, args)Try it out:
$ python
Python 2.7.16 (default, Mar 4 2019, 09:02:22)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from eval import *
>>> sum2(1, 2, 3)
6
Next, crack open tokendefs.txt and add a SYMBOL token type. We'll start off with something very basic: a symbol is a group of lowercase letters:
SYMBOL
[a-z]+
Let's check that it works with our lexer:
$ echo 'sum' | python lex.py
[('SYMBOL', 'sum'), ('WSPACE', '\n')]
Next, we need to update our parser to handle symbols. We'll make a slight tweak to our grammar:
SEXPR = ATOM | LIST
ATOM = INT | SYMBOL
LIST = OPAREN CPAREN
| OPAREN SEXPR { WSPACE SEXPR } CPAREN
We'll need to add another terminal parser for SYMBOL:
def parse_symbol(tokens):
return parse_terminal(tokens, "SYMBOL")We need to create a parse_atom function for the rule ATOM = INT | SYMBOL. This uses alternation, so its implementation will look very similar to parse_sexpr:
def parse_atom(tokens):
(ast, remaining) = parse_int(tokens)
if ast is not None:
return (ast, remaining)
else:
return parse_symbol(tokens)And speaking of parse_sexpr, we need to update that as well:
def parse_sexpr(tokens):
(ast, remaining) = parse_atom(tokens)
if ast is not None:
return (ast, remaining)
else:
return parse_list(tokens)Now let's try it out:
$ echo -n '(sum 1 2 3)' | python parse.py
('LIST', [('SYMBOL', 'sum'), ('INT', '1'), ('INT', '2'), ('INT', '3')])
Fantastico!
Now, we'll need to update eval2 to handle symbol lookup:
def is_symbol(ast):
return ast[0] == "SYMBOL"
def lookup(ast, env):
symbol_name = ast[1]
return env[symbol_name]
def eval2(ast, env):
if is_int(ast):
return int(ast[1])
elif is_symbol(ast):
return lookup(ast, env)
else:
raise Exception("Don't know how to evaluate %s" % ast)In the eval.py __main__ implementation, replace env = {} with:
env = { "sum": sum2 }Now let's see if we can eval a symbol:
$ echo -n 'sum' | python eval.py
<function sum2 at 0x102b40398>
Rock and roll! We are really close now.
Now we just need to add an elif is_list() clause to eval2:
def is_list(ast):
return ast[0] == "LIST"
def eval2(ast, env):
if is_int(ast):
return int(ast[1])
elif is_symbol(ast):
return lookup(ast, env)
elif is_list(ast):
children = ast[1]
first, rest = children[0], children[1:]
operand = eval2(first, env)
args = [eval2(arg, env) for arg in rest]
return apply2(operand, args)
else:
raise Exception("Don't know how to evaluate %s" % ast)
def apply2(op, args):
return op(*args)Now let's eval (sum 1 2 3):
$ echo -n '(sum 1 2 3)' | python eval.py
6
KABLAMMO MOTHERFUCKER!!! You just wrote an interpreter!
Next time, we'll look at defining new entries in the environment.
