Skip to content

Instantly share code, notes, and snippets.

@darius
Last active January 17, 2021 23:00
Show Gist options
  • Save darius/bf3742c370873c50fc1ddac941f1507f to your computer and use it in GitHub Desktop.
Save darius/bf3742c370873c50fc1ddac941f1507f to your computer and use it in GitHub Desktop.
#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""Caronte: uma pequena lua.
(from http://canonical.org/~kragen/sw/dev3/caronte.py by Kragen Sitaker, hacked a bit by Darius)
This is part of a series of simple exercises I’ve been doing in
hacking programming-language prototypes together from scratch in one
day. After 4½ hours this successfully generates a fractal and I am
going to bed happy. It compiles a tiny infix language into a
Forth-like sequence of stack operations (but with a JS-like memory
model), which it then interprets.
In this case the language is sort of a minimal version of JS or Lua.
Caronte is the Portuguese name for Charon, the 1205-km-diameter moon
of Pluto, about a fiftieth of the mass of Earth’s Moon (a lua).
I started by letting the design get fairly out of control with
Caronte-grande:
Caronte-grande
--------------
In addition to some basic arithmetic, the Caronte syntax supports
dicts, indexing, conditionals, assignments, sequencing, looping,
strings, and function definitions. It’s an expression-oriented
language like ML; ``;`` is an infix sequencing operator on expressions
which returns the rightmost result, like ``,`` in C or JS.
The function-definition syntax is a restricted form of the new JS
arrow syntax: (a, b, c) => expr, where the literal parens are required
but of course the argument list can be of any length. Because ``;``
is an expression, you can parenthesize a function body with a sequence
of statements. “=>” associates to the right so you can write “k = (a)
=> (b) => a”; since its left operand isn’t an expression it wouldn’t
make sense for it to associate to the left.
For conditionals I’m using the ``a ? b : c`` syntax from C, because I
liike punctuation. Loops use an infix ``|`` operator which has two
variants, a foreach variant and a while variant. The foreach variant
is ``c <- cs | e``, which evaluates ``e`` repeatedly with local
bindings for ``c`` taken from the sequence ``cs``. The while variant
is ``p | q``, which alternates between evaluating ``p`` and, if it was
true, re-evaluating ``q``. ``|`` has higher precedence than ``;`` and
the boolean and comparison operators, and it associates to the right
so that it’s flat to write ``a <- as | b <- bs | c``, and ``x | y |
z`` is ``while (x) while (y) z``, as you’d expect, rather than ``while
(while (x) y) z``.
Loops evaluate to sequences, which are materialized in memory and
indexed with numbers.
The dict literal syntax is more like JS’s than Lua’s: { name1: expr1,
name2: expr2, (expr3): expr4 }. Dict indexing can be done with names
using "." or with expressions using "[]".
Variable declaration is done with an "=" assignment; duplicate
declarations in the same scope are a compile-time error. Mutation is
achieved with the ":=" mutation operator. Mutating undeclared
variables is also an error, at compile-time; mutating nonexistent dict
entries is an error at run-time. Adding new dict entries is done with
the "=" operator, like variable declaration, and is a run-time error
if the entries already exist.
There’s a prefix “!!!” operator from Wheat to return an error value,
which is the only kind of value the “;” operator doesn’t ignore on its
left argument. “!!” is an error-handling operator; “a !! b” is a,
unless a evaluated to an error, in which case it’s b. There’s an
unary prefix "!" operator which returns nil when applied to a
non-error value, or the error argument "x" when applied to an error
value "!!!x". Error values cannot be stored in dictionaries; an
attempt to do so yields the error value from the assignment and leaves
the dictionary unchanged.
Comparisons and boolean operations have “syntax error” associativity.
Instead of parsing ``a > b < c`` as either "a > (b < c)", "(a > b) <
c", or "a > b and b < c", it’s just a parse error. Similarly "a and b
or c" is a syntax error, although you can chain "and" (and "abj",
which is "and not") and "or" as you like. I’m tempted to do the same
thing with "*" and "/".
Bitwise operations are provided by functions rather than operators.
This syntax is brutally infix; the precedence order is something like
;
:= =
!!
=>
or and abj
== < > <= >= !=
|
?:
+ - (binary)
* / %
- (unary) ! !!! not
**
. [] (function call)
parenthesization
Here’s a whack at some pseudo-BNF syntax:
program: _ expr
_: (" " | "\n" | "\t")*
expr: e10 (";"_ e10)*
e10: lvalue (":=" | "=")_ e10 | e20
e20: e30 ("!!"_ e30)*
e30: formals "=>"_ e30 | e33
e33: e36 (("and" | "abj")_ e36)* | e36 ("or"_ e36)+
e36: e40 ("==" | "<" | ">" | "<=" | ">=" | "!=")_ e40 | e40
e40: lvalue "<-"_ e60 "|"_ e40 | e60 "|"_ e40 | e60
e60: e70 "?"_ e70 ":" e60 | e70
e70: e80 (("+" | "-")_ e80)*
e80: e90 (("*" | "/" | "%")_ e90)*
e90: (("-" | "!" | "!!!" )_)* e100
e100: e110 ("**"_ e110)*
e110: e120 ("."_ id | "["_ expr "]"_ | actuals)*
e120: "("_ expr ")"_ | id | const | dict
lvalue: id ("."_ id | "["_ expr "]"_)*
formals: "("_ ")"_ | "("_ id (","_ id)* ")"_
actuals: "("_ ")"_ | "("_ e20 (","_ e20)* ")"_
const: string | integer | float | "nil"_
dict: "{"_ "}"_ | "{"_ pair (","_ pair)* "}"_
pair: id ":"_ e20 | "("_ e20 ")"_ ":"_ e20
Maybe loops and conditionals should be incomparable in the same way as
and and or?
I feel a little bit bad to require two separate tokens ")" "=>" in
between arguments and body, and leaving the end of the function body
undelimited seems potentially error-prone. I wonder if some kind of
bra-ket notation would be an improvement: ``{a b : 10 * a + b}`` or
something.
Caronte-pequeno
---------------
Okay, well, the above has clearly gone a bit out of control for a
one-night hack and is kind of overwhelming. Can I scale it down to
something tasteful and enjoyable? Because it took me an hour and a
half to write the *textual* description of Caronte-grande and I’m
getting sleepy!
How about revisiting my Sierpinski thing from the other week?
"""
from __future__ import division, print_function
import re
sier = """
c = {};
x <- iota(128) | (c[x] = 0);
c[0] := 1;
y <- iota(128) | (
x <- iota(127) | (c[x+1] := (c[x] + c[x+1] == 1) ? 1 : 0);
x <- iota(128) | say((c[x] == 0) ? " " : "#");
say("\n")
)
"""
"""That ought to need only this subset of the grammar:
program: _ expr
_: (" " | "\n" | "\t")*
expr: e10 (";"_ e10)*
e10: lvalue (":=" | "=")_ e10 | e36
e36: e40 "=="_ e40 | e40
e40: lvalue "<-"_ e60 "|"_ e40 | e60
e60: e70 "?"_ e70 ":" e60 | e70
e70: e110 ("+"_ e110)*
e110: e120 ("["_ expr "]"_ | actuals)*
e120: "("_ expr ")"_ | id | const | dict
lvalue: id ("["_ expr "]"_)*
actuals: "("_ e36 (","_ e36)* ")"_
const: string | integer
dict: "{"_ "}"_
I’m going to try a parsing approach inspired by Darius Bacon’s
min_parson.py:
https://gist.github.com/darius/7d5899523cc8fd4a64870dea601b77fb
Parsers are functions invoked with the remaining tail of the input,
which return an empty list on failure or a list containing one tuple
[(results, remaining)] on success.
"""
"""
One version of the example program was:
c = {};
x <- iota(128) | (c[x] = 0);
c[0] := 1;
y <- iota(128) | (
x <- iota(127) | (c[x+1] := c[x] + c[x+1] == 1 ? 1 : 0);
x <- iota(128) | say(c[x] == 0 ? " " : "#");
say("\n")
)
At present this gets 'compiled' to:
[(('c', {}, init, ;,
'x', 'iota', @, int, '128', arg, call, each, 'c', @, 'x', @, at, int, '0', init, done, ;,
'c', @, int, '0', at, int, '1', put, ;,
'y', 'iota', @, int, '128', arg, call, each,
'x', 'iota', @, int, '127', arg, call, each,
'c', @, 'x', @, int, '1', +, at,
'c', @, 'x', @, at, @, 'c', @, 'x', @, int, '1', +, at, @, +,
int, '1', if, int, '1', else, int, '0', then, ==, put, done, ;,
'x', 'iota', @, int, '128', arg, call, each,
'say', @,
'c', @, 'x', @, at, @, int, '0', if, string, ' ', else, string, '#', then, ==,
arg, call, done, ;,
'say', @, string, '\n', arg, call,
done), '')]
Aside from the undesirable parentheses in the above, this is wrong;
it’s inadvertently parsing as ...c[x+1] == (1 ? 1 : 0). To fix this I
would need to do some sort of ("?"_ e70 ":" e70)* sort of thing, I
think, which is pretty close to how I think about it when I’m writing
it.
For the time being, though, I can hack this with more parens, like the
undesirable parens needed for loop bodies with assignments, and this
gives me the correct result:
[(('c', {}, init, ;,
'x', 'iota', @, int, '128', arg, call, each, 'c', @, 'x', @, at, int, '0', init, done, ;,
'c', @, int, '0', at, int, '1', put, ;,
'y', 'iota', @, int, '128', arg, call, each,
'x', 'iota', @, int, '127', arg, call, each,
'c', @, 'x', @, int, '1', +, at,
'c', @, 'x', @, at, @, 'c', @, 'x', @, int, '1', +, at, @, +,
int, '1', ==,
if, int, '1', else, int, '0', then,
put, done, ;,
'x', 'iota', @, int, '128', arg, call, each,
'say', @,
'c', @, 'x', @, at, @, int, '0', ==, if, string, ' ', else, string, '#', then,
arg, call, done, ;,
'say', @, string, '\n', arg, call,
done), '')]
"""
def match(regexp):
"Parse with an RE."
regexp = re.compile(regexp)
def f_re(s):
mo = regexp.match(s)
if not mo:
return []
return [(mo.groups(), s[mo.end():])]
return f_re
def eta(thunk):
"""Permit circular and forward references by η-expansion.
What we want is for eta(lambda: x)(y) to return x(y), but wait to
evaluate x until necessary.
"""
return lambda *args: thunk()(*args)
def lit(t):
"Match a literal string."
return lambda s: [((), s[len(t):])] if s.startswith(t) else []
assert lit("foo")("foobar") == [((), "bar")]
assert lit("foo")("oobar") == []
def seq(p, q):
"Match p, then q; language concatenation; Kleene algebra product."
def f(s):
for pr, ps in p(s):
for qr, qs in q(ps):
return [(pr + qr, qs)]
return []
return f
assert seq(lit("fo"), lit("ob"))("foobar") == [((), "ar")]
assert seq(match("(f)"), match("o*(ba)"))("foobar") == [(("f", "ba"), "r")]
assert seq(match("(f)"), match("o*(ba)"))("oobar") == []
assert seq(match("(f)"), match("o*(ba)"))("fooar") == []
def seqs(*ps):
"Match a sequence of items."
rv = lit("")
for p in ps:
rv = seq(rv, p)
return rv
assert seqs(match(r"(\w+)\s*"),
match(r"(\d+)"),
match(r"(x*)"))("the 20xxy x") == [(('the', '20', 'xx'), 'y x')]
def alt(p, q):
"PEG ordered choice; language union; roughly Kleene algebra sum."
def f(s):
for item in p(s):
return [item]
return q(s)
return f
fail = lambda s: []
def alts(*ps):
"N-way ordered choice."
rv = fail
for p in ps:
rv = alt(rv, p)
return rv
assert alts(match("(x)"), match("(y)"))("xes") == [(("x",), "es")]
assert alts(match("(x)"), match("(y)"))("yes") == [(("y",), "es")]
assert alts(match("(x)"), match("(y)"))("zes") == []
def rep(element):
"Kleene closure. Zero or more. Repetition."
def f(s):
results = ()
while True:
r = element(s)
if not r:
return [(results, s)]
results += r[0][0]
s = r[0][1]
return f
def do(*k):
"A parser that consumes no text but succeeds with tuple k."
return lambda s: [(k, s)]
def feed(p, f):
return lambda s: [((f(*k),), s) for k,s in p(s)]
insn = do
__ = rep(alts(lit(" "), lit("\n"), lit("\t")))
lsp = lambda s: seq(lit(s), __)
dict_ = seqs(lsp("{"), lsp("}"), insn('newdict'))
string = seqs(insn('con'), match(r'"((?:[^\\"]|\\.)*)"\s*'))
integer = seq(insn('con'), feed(match(r'(\d+)\s*'), int))
const = alt(string, integer)
actuals = seqs(lsp("("),
eta(lambda: e36), insn('arg'),
rep(seqs(lsp(","), eta(lambda: e36), insn('arg'))),
lsp(")"),
insn('call'))
id_ = seq(insn('id'), match(r'([A-Za-z_]\w*)\s*'))
lvalue = seq(id_, rep(seqs(insn('@'), lsp("["), eta(lambda: expr), lsp("]"), insn('at'))))
e120 = alts(seqs(lsp("("), eta(lambda: expr), lsp(")")),
seq(id_, insn('@')), const, dict_)
e110 = seq(e120, rep(alt(seqs(lsp("["), eta(lambda: expr), lsp("]"), insn('at'), insn('@')),
actuals)))
e70 = seq(e110, rep(seqs(lsp("+"), e110, insn('+'))))
e60 = alt(seqs(e70, lsp("?"), insn('if'),
e70, lsp(":"), insn('else'),
eta(lambda: e60), insn('then')),
e70)
e40 = alts(seqs(lvalue, lsp("<-"), e60,
lsp("|"), insn('each'), eta(lambda: e40), insn('done')),
e60)
e36 = alts(seqs(e40, lsp("=="), e40, insn('==')), e40)
e10 = alts(seqs(lvalue, lsp(":="), eta(lambda: e10), insn('put')),
seqs(lvalue, lsp("="), eta(lambda: e10), insn('init')),
e36)
expr = seqs(e10, rep(seqs(lsp(";"), insn(";"), e10)))
program = seq(__, eta(lambda: expr))
parse = program
def run(prog):
# Scan program to set up jump targets
jump_targets = {}
cstack = []
each, done, if_, else_, then = 'each done if else then'.split()
i = 0
while i < len(prog):
c = prog[i]
if c in [each, if_]:
cstack.append((c, i))
elif c == done:
a, b = cstack.pop()
assert a == each
jump_targets[i] = b + 1
jump_targets[b] = i
elif c == else_:
a, b = cstack.pop()
assert a == if_
jump_targets[b] = i + 1
cstack.append((c, i))
elif c == then:
a, b = cstack.pop()
assert a == else_
jump_targets[b] = i
elif c in ('id', 'con'):
i += 2
continue
i += 1
assert not cstack
pc = 0
ram = {'iota': lambda n: list(range(n)), 'say': lambda v: print(*v, end='')}
stack = []
args = []
loops = []
lenprog = len(prog)
while pc < lenprog:
c = prog[pc]
if c == 'newdict':
stack.append({})
elif c in ('init', 'put'):
val = stack.pop()
loc = stack.pop()
loc[0][loc[1]] = val
stack.append(val) # though it'll typically just get popped by the ';' instruction following
elif c == '@':
loc = stack.pop()
stack.append(loc[0][loc[1]])
elif c == ';':
stack.pop()
elif c == 'id':
pc += 1
stack.append((ram, prog[pc])) # locations are tuples
elif c == 'con':
pc += 1
stack.append(prog[pc])
elif c == 'arg':
args.append(stack.pop())
elif c == 'call':
f = stack.pop()
stack.append(f(*args))
args = []
elif c == 'each':
seq = stack.pop()
loc = stack.pop()
loops.append([loc, seq, 0])
pc = jump_targets[pc]
continue
elif c == 'done':
loc, seq, idx = loops[-1]
if idx >= len(seq):
loops.pop()
pc += 1
continue
loc[0][loc[1]] = seq[idx]
loops[-1][-1] += 1
pc = jump_targets[pc]
continue
elif c == 'at':
idx = stack.pop()
loc = stack.pop()
stack.append((loc, idx))
elif c == '+':
x = stack.pop()
stack.append(stack.pop() + x)
elif c == '==':
x = stack.pop()
stack.append('true' if stack.pop() == x else 'false')
elif c == 'if':
x = stack.pop()
assert x in ('true', 'false')
if x == 'false':
pc = jump_targets[pc]
continue
elif c == 'else':
pc = jump_targets[pc]
continue
elif c == 'then':
pass
else:
assert 0, c
pc += 1
if 1:
prog = parse(sier)
print(prog)
run(prog[0][0])
#. [(('id', 'c', 'newdict', 'init', ';', 'id', 'x', 'id', 'iota', '@', 'con', 128, 'arg', 'call', 'each', 'id', 'c', '@', 'id', 'x', '@', 'at', 'con', 0, 'init', 'done', ';', 'id', 'c', '@', 'con', 0, 'at', 'con', 1, 'put', ';', 'id', 'y', 'id', 'iota', '@', 'con', 128, 'arg', 'call', 'each', 'id', 'x', 'id', 'iota', '@', 'con', 127, 'arg', 'call', 'each', 'id', 'c', '@', 'id', 'x', '@', 'con', 1, '+', 'at', 'id', 'c', '@', 'id', 'x', '@', 'at', '@', 'id', 'c', '@', 'id', 'x', '@', 'con', 1, '+', 'at', '@', '+', 'con', 1, '==', 'if', 'con', 1, 'else', 'con', 0, 'then', 'put', 'done', ';', 'id', 'x', 'id', 'iota', '@', 'con', 128, 'arg', 'call', 'each', 'id', 'say', '@', 'id', 'c', '@', 'id', 'x', '@', 'at', '@', 'con', 0, '==', 'if', 'con', ' ', 'else', 'con', '#', 'then', 'arg', 'call', 'done', ';', 'id', 'say', '@', 'con', '\n', 'arg', 'call', 'done'), '')]
#. ################################################################################################################################
#. # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
#. # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
#. #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### ####
#. # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
#. # # # # # # # # # # # # # # # #
#. ######## ######## ######## ######## ######## ######## ######## ########
#. # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
#. # # # # # # # # # # # # # # # #
#. #### #### #### #### #### #### #### ####
#. # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ##
#. # # # # # # # #
#. ################ ################ ################ ################
#. # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
#. # # # # # # # # # # # # # # # #
#. #### #### #### #### #### #### #### ####
#. # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ##
#. # # # # # # # #
#. ######## ######## ######## ########
#. # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ##
#. # # # # # # # #
#. #### #### #### ####
#. # # # # # # # #
#. ## ## ## ##
#. # # # #
#. ################################ ################################
#. # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
#. # # # # # # # # # # # # # # # #
#. #### #### #### #### #### #### #### ####
#. # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ##
#. # # # # # # # #
#. ######## ######## ######## ########
#. # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ##
#. # # # # # # # #
#. #### #### #### ####
#. # # # # # # # #
#. ## ## ## ##
#. # # # #
#. ################ ################
#. # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ##
#. # # # # # # # #
#. #### #### #### ####
#. # # # # # # # #
#. ## ## ## ##
#. # # # #
#. ######## ########
#. # # # # # # # #
#. ## ## ## ##
#. # # # #
#. #### ####
#. # # # #
#. ## ##
#. # #
#. ################################################################
#. # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
#. # # # # # # # # # # # # # # # #
#. #### #### #### #### #### #### #### ####
#. # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ##
#. # # # # # # # #
#. ######## ######## ######## ########
#. # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ##
#. # # # # # # # #
#. #### #### #### ####
#. # # # # # # # #
#. ## ## ## ##
#. # # # #
#. ################ ################
#. # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ##
#. # # # # # # # #
#. #### #### #### ####
#. # # # # # # # #
#. ## ## ## ##
#. # # # #
#. ######## ########
#. # # # # # # # #
#. ## ## ## ##
#. # # # #
#. #### ####
#. # # # #
#. ## ##
#. # #
#. ################################
#. # # # # # # # # # # # # # # # #
#. ## ## ## ## ## ## ## ##
#. # # # # # # # #
#. #### #### #### ####
#. # # # # # # # #
#. ## ## ## ##
#. # # # #
#. ######## ########
#. # # # # # # # #
#. ## ## ## ##
#. # # # #
#. #### ####
#. # # # #
#. ## ##
#. # #
#. ################
#. # # # # # # # #
#. ## ## ## ##
#. # # # #
#. #### ####
#. # # # #
#. ## ##
#. # #
#. ########
#. # # # #
#. ## ##
#. # #
#. ####
#. # #
#. ##
#. #
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment