I feel somewhat pressed to give a negative review of this book. This comes from someone who has worked on various Lisp implementations, and written some amount of C, but that isn’t an excuse to be overly harsh. This book, however, does not provide many nice things to say, and plenty of un-nice things. My apologies in advance.
First off: God help you if you are going to write your first interpreter in C of all things. No one I know thinks it’s a good idea to start inventing a programming language, which perhaps is one of the more ill-defined and bug-prone things you can do, with an implementation in C. So the premise is already a bad idea. Maybe you can work your way down if you so want, but it’s an awkward starting point.
Second off: God can’t help you if you read this book. The interpreter uses very dubious implementation decisions, and the language that is created as a side-effect makes reasoning about anything near impossible. Someone once said “you can’t be doing it wrong if nobody knows what you’re doing.” And when you have a book which just introduces a language lazily, there is no way to check if you’ve implemented anything correctly. I suppose that is not a problem if you have decided it can be “your” language, and the only notion of ownership is that it contains your blunders (as well as those by the author); but it makes coming up with a mental model for the language absolutely impossible, and so the language is useless because programs cannot be written in it.
The book misses the “point” of Lisp frequently. The first occurrence is in making parsing more difficult than it should be.
It never struck me as a good idea to start off with an evaluator for a toy calculator language, and then only slowly build up to an actual programming language. This is the beginning of the “nobody knows what you’re doing” line, really. The language is never presented in the book - you are just expected to know how it works from the C code. And you end up rewriting a lot of code as the author drip-feeds you marginally better approximations of the programming language.
And a parser generator is very overkill for a Lisp parser - just use recursive decent! If it weren’t for how C makes memory management a pain, it would be pretty easy to write a recursive decent parser, and you would have a program with no external libraries.
As for why a parser generator is a bad idea for Lisp: the single syntax rule
E ::= atom | '(' E* ')'
is pretty simple to translate into an algorithm for a reader. Such an algorithm could be written in English as “if the next character is an open paren, keep reading expressions until you see an otherwise unmatched close paren; else if the next character is whitespace, keep looking; else read an atom until you reach either kind of paren or whitespace.” Explaining how parser combinators and EBNF work is unnecessary with this operational description.
The usual algorithm is presented in Appendix A, but is gratituously
O(length^2) while reading strings and symbols, as realloc(part, strlen(part)+2)
for each character walks an O(length) string O(length) times. O(dear) – but maybe
it’s not so bad as most computation isn’t reading. Well, unless it is, but here it isn’t.
It’s okay if a program crashes during development, but our final program would hopefully never crash
braces for disaster
but our final program would hopefully never crash, and should always explain to the user what went wrong.
Okay, guess we avoided that one.
C programs crashing is a fact of life. If anything goes wrong the operating system kicks them out.
Nope, we’re fucked. Without manual intervention
c.f. -fsanitize=memory
, C programs do pretty much no error detection
and a lot of things can go wrong, without errors being the kind that
the operating system has to handle.
And, while a crash is preferable to the program doing something bad,
it is not useful on its own to find programming errors. We built a few of
the programs, gave them ^D and they all gave a segmentation fault, which is as
much of a useful error as “you’re sick” is useful diagnosis from a doctor. At
this point, we really would want to pull up gdb
and try to work our way
backwards from the cause of the crash. (gdb fortunately is mentioned by the
book, but the reader isn’t given much of a clue other than it “can be difficult
and complicated to use”, which doesn’t inspire much confidence; and the only
instruction is to figure out how to run your program under gdb.)
Now, I will be honest, I hate C with a passion. You shouldn’t learn it to learn writing an interpreter, you pretty much should never even have a reason to use it, and I’d hate to write “reliable” software in it. But credit where it’s due: if “C programs crashing is a fact of life” was true, then most computers would be very broken. People who write real C programs manage to not mess up that badly…most of the time (the other times usually produce a CVE report and everyone has to update their systems). It’s not great, but it’s also not that bad.
We start to see an object representation emerge in this chapter.
I suppose we could say that it is silly to have a value and error
type all in a struct
instead of using a union
. Any Lisp value, moreso
when other data types are introduced, is going to be huge.
Ideally in an actual programming language we would also implement
errors using exceptions, so that we don’t have to check that every Lisp
value isn’t an error value, but that isn’t going to happen in this book.
Also, shouldn’t we use an enum type and then use that instead of int
?
The author states, on how to engineer software:
It requires faith and intuition. Faith is required to believe that if the stars align, and every incantation is correctly performed for this magical machine, the right thing will really happen. And intuition is required to work out what has gone wrong, and how to fix things when they don’t go as planned.
Unfortunately these can’t be taught directly…
Testing is generally preferable to faith, and testing needn’t be difficult with computers. Being able to unit test individual parts both lessens the need for intuition and lets you build an intutition for how parts fail. A good mental model and tests greatly reduce the magic in a system; these are lacking if the system seems magic or otherwise unexplainable. (Your experience challenging the powers that control magic incantations outside of computers may vary; by reading you agree I am not responsible any harm which may be caused by you trying to apply these principles outside of computers.)
We are taught something else that contradicts what might be taken for granted in many programming languages, that features may be introduced in a language with only data and procedural abstractions. Some languages like Lisp also provide syntactic abstractions. Instead we are told that one needs to introduce “the four typical steps”:
Syntax Add new rule to the language grammar for this feature. Representation Add new data type variation to represent this feature. Parsing Add new functions for reading this feature from the abstract syntax tree. Semantics Add new functions for evaluating and manipulating this feature.
Introducing a new data type only requires defining
the data type and relevant operations on the data type. The “semantics” of
those functions is very different to, say, the semantics of the if
special form,
as the user does not need to know any special evaluation rules or anything
of the sort. In a Lisp system, many other language constructs can be
implemented using a macro system, and they do not need new data types,
parsing or any new syntax (at least from the point of view of the parser).
The book introduces “q-expressions” for syntactic abstraction, which are claimed to be simpler than macros. A macro does have grammar, I suppose, but it does not affect the Lisp reader at all. It does not require a new data type, as a macro call looks quite like a normal function call. The semantics of the language must be modified to handle macros, which is the only relevant feature on the list.
The q-expression stuff seems similar to a fexpr, which is a function
which is called without evaluating its arguments. But BYOL does
always evaluate the arguments, and instead quotes its arguments. And
it quotes them in a weird way: instead of writing '(a b c)
we write
{a b c}
. To quote a symbol (i.e. 'a
) one is supposed to write…
{a}
? This makes it impossible to quote a function call to a function
with zero arguments; it is already impossible to call a function
with zero arguments, but one inconsistency does not justify another.
This list also conflates the role of an interpreter and the semantics of the language. Theoretically, the interpreter can do anything, as long as it happens to coincide with the semantics of the language. This gives way to the possibility to add a compiler for the language, even though the specification gives you a sort of algorithm for interpreting a program. So the semantics of a language are not how your implementation interprets them (unless your language is Python and you wrote CPython, I suppose), and the parser algorithm is similarly irrelevent.
Parsing is similarly just the implementation of syntax; though usually “parsing” is meant to mean the act of producing an abstract syntax tree; the act of “reading [a] feature from the abstract syntax tree” is only incidental to this book, where converting from the representation output by the parser generator to the representation of Lisp objects is called “reading” in Chapter 9. This is also an abuse of terminology, as reading in Lisp is defined as the act of turning text into objects (not unlike “parsing”).
Look at the first yellow box on the page:
In our Lisp, when we re-assign a name we’re going to delete the old association and create a new one. This gives the illusion that the thing assigned to that name has changed, and is mutable, but in fact we have deleted the old thing and assigned it a new thing.
Please don’t do that. What would be the result of evaluating:
(let ((a 1)) (+ (let ((a 2)) a) a))
It should evaluate to 3, but this description of how to modify the environment completely breaks scoping. The implementation behaves nonetheless, due to pervasive copying; don’t drink and alias. Such copying also makes it hard to reason about what the effects of mutating/re-assigning a variable actually are. For example, the program
(= {a} 2)
(+ (let {do (= {a} 3) a}) a)
evaluates to 5; the global binding for a
is copied inside let
and only the copy is updated by (= {a} 3)
.
Symbols aren’t interned - they are kept as strings forever and
=strcmp=ed all the time. I suppose that works, but it is slow
and I think it looks a bit stupid to be writing
if (!strcmp(a, b))
instead of if (a == b)
. And then the
builtin_op
dispatch also uses strings when an enum would work.
Functions use dynamic scoping. This is categorically a mistake,
and one which ironically has been suggested to seriously mess up
fexprs. Were Common Lisp to only use dynamic scoping, the following
program would work, by f
gleaning the value of x
from its caller:
(defun f () x)
(let ((x 42)) (f))
And this one would not, as there is no way for the function returned
by g
to remember its enclosing state:
(defun constantly (x)
(lambda () x))
(funcall (constantly 42))
The latter would work using lexical scoping where evaluating a lambda-term produces a closure retaining its enclosing environment. In a way, this is a rather simple form of encapsulation that has been forfeit by using dynamic scoping. Implementing lexical scoping is suggested as a “bonus project”; notwithstanding that this is like trying to sell a steering wheel for one’s car as an addon, the author believes that it is a kind of static analysis that works without affecting semantics. Why this is wrong is explained in my section on Chapter 16.
Dynamic scoping perhaps makes q-expressions simpler, but only in a superficial
and abstraction-breaking manner; Shutt’s fexprs require passing around the
environment between fexprs and calls to eval
, whereas q-exprs do not. But
dynamic scoping conflates the environments used in different meta-levels of sorts;
for a form (def {if test then else} (\ ...))
should test
, then
and else
be bound while evaluating either the then
or else
forms? With an explicit
environment, the implementor would need to explicitly introduce such bindings
to the environment, and they would not do that, because that is a silly thing
to do. With an implicit dynamic environment, this happens implicitly and
silly consequences occur.
Quick update on the object layout:
struct lval {
int type;
/* Basic */
long num;
char* err;
char* sym;
char* str;
/* Function */
lbuiltin builtin;
lenv* env;
lval* formals;
lval* body;
/* Expression */
int count;
lval** cell;
};
A Lisp object takes at least 88 bytes of memory on x86-64 Linux. Congratulations!
This is not a superficial issue, as larger objects reduce the effectiveness of
cache memory, slowing down all use of such objects. A simple Lisp interpreter
written by a friend uses 24 byte objects; (fib 30)
is slowed down by about
40% when the objects are padded out to 88 bytes.
The main reason I wanted to write this “review” was about this chapter. That reason was this code block:
; First, Second, or Third Item in List
(fun {fst l} { eval (head l) })
(fun {snd l} { eval (head (tail l)) })
(fun {trd l} { eval (head (tail (tail l))) })
Spaces flailing everywhere, and evaluating elements of a list for no good reason. A beautiful four-line summary of the sheer stupidity this book is built upon. With the bold title - hey, anyone can build their own language, and add whatever features they want, no one stopped to ask why or how the hell those features would work. But why it works in this interpreter is just as funny. Back up a few lines, and we see the reason:
The
head
function is used to get the first element of a list, but what it returns is still wrapped in the list. If we want to actually get the element out of this list we need to extract it somehow.
Single element lists evaluate to just that element, so we can use the
eval
function to do this extraction. But a single element list with
a “normal” list (or “S-expression” in the silly terminology of the
author) is evaluated - so if that element does side effects for some
reason, it could be evaluated multiple times!
lispy> ((\ {x} {list (fst x) (fst x)}) {(print "hi")})
"hi"
"hi"
{() ()}
Making evaluation implicit does not help the meta-programmer.
The distinction between “S-expressions” and “Q-expressions” suggests
that code is in fact not (easily maniuplated as) data in this
language, as data must always be written with {}
to avoid
evaluation.
; List Length
(fun {len l} {
if (== l nil)
{0}
{+ 1 (len (tail l))}
})
Beautiful placement of {}
by the way - clearly this guy is well
experienced in Lisp. So, we aren’t calling the 0, but we are still
calling the +, and we can only tell because there are more arguments
to +. Riiiiight.
When dealing with conditionals we added no new boolean type to our language. Because of this we didn’t add true or false either. Instead we just used numbers. Readability is still important though, so we can define some constants to represent these values.
The user is not forced to use these named constants, and can use numbers and empty lists instead as they like. This choice empowers users, something we believe in.
The first paragraph conflicts with the “information” on how to add features in Chapter 10, as discussed. The second leaves the question of how having multiple ways to spell true, false and the empty list is “empowering” in any way.
Sometimes we want to save results to local variables using the
=
operator. When we’re inside a function this will implicitly only save results locally, but sometimes we want to open up an even more local scope. For this we can create a functionlet
which creates an empty function for code to take place in, and evaluates it.; Open new scope (fun {let b} {((\ {_} b) ())})
This definition of let
is rather unfortunate, along with =
, as there
is a pretty equivalence between let and lambda forms that Lisp (and ML) relies on. While it is somewhat
superficial to complain about =
being used for assignment rather than
testing equality, the following tale is worth re-telling:
There is a story that Ken Iverson, the inventor of APL, was passing a terminal at which a Fortran programmer had just typed:
I = I+1Ken paused for a moment, muttered “no it doesn’t”, and passed on.
Despite the previously mentioned flaws, the author still insists that you just need some more features in order to have a “production-strength” programming language. Which, granted, if you are a game developer, or have a startup, or even both, is probably not far off. For the rest of us, well…
The idea of using a “variable hash table” per environment is plain silly. If you can afford some “compile-time” preprocessing, you could rewrite local variables to use fixed indices, then use real arrays for each environment frame; which would avoid performing hashing and probing and all that noise.
And, again, how environments are implemented is not a property of a language, rather a property of an implementation. The distinction is pretty important when you wish to specify how the language works. Consider that, if one wants to evaluate an expression on paper, they might write down an environment, but they don’t draw out a linked list or hash table or whatever else - they just draw the environment as an opaque object, without drawing its internals.
Another bonus project is to implement garbage collection, to avoid performing
so much copying in the interpreter. But much of the interpreter is written
around copying data structures and then mutating them, making it difficult
to rewrite to avoid copying, and what the semantics are meant to be is hardly
documented. For example, we might find that the program which
tests the extents of bindings in Chapter 11 now evaluates to 6
as we update
the same binding. Careless language design does restrain the language
implementor - for example,
pass-by-copy semantics in PHP relies on using immedate reference counting to
avoid copying where not copying can’t be detected (i.e. when the reference
count is 1), and much PHP code also expects destructors to run immediately
after an object is unreachable. Thus it is difficult to implement any deferred
reference counting or tracing garbage collection for PHP; we similarly cannot
easily avoid copying for this language, because it interacts with mutability.
The author is dead wrong on what lexical scoping is, claiming that it is
a static analysis pass which detects unbound variables. The language in the
book uses dynamic scoping, so finding unbound variables using “the rules for
variable definition” requires global program analysis, if it is even decidable
with fexprs! (Also note that using an environment based on vectors even
for nested environments, requires lexical scoping, so that variable numbering
can be done per function.) Static typing may also be undecidable with fexprs,
or at least incredibly difficult, as we have to type check eval
.
I’ve also written a parallel garbage collector, which has its own allocator and data structures. From that experience I think the description of the “Pool Allocation” project is especially inadequate and is justified by misinformation. The author suggests to implement a pool allocator to speed up allocation. The term is not clearly defined in the book:
One method of doing this is to call malloc once at the beginning of the program, allocating a large pool of memory. We should then replace all our malloc calls with calls to some function that slices and dices up this memory for use in the program.
This description describes many kinds of allocators, which by definition all partition a larger range of memory into smaller allocations; the only distinct part is that one large allocation is made using the system allocator. Only allocating once from the system allocator may not provide enough memory for large applications, and may waste space for small applications; we may still get a similar effect by allocating more large chunks whenever they are needed. But the effect
“Pool allocation” has at least two distinct meanings. One definition
is an allocator which only allocates a fixed size,
but this is inadequate as the interpreter creates objects of different sizes.
One may benefit from using a pool allocator for common sizes (like that of
struct lval
) combined with a general-purpose memory allocator for other sizes.
Another definition is similar to region-based memory management, which
doesn’t seem easily applicable to the interpreter, as we cannot easily
predict the lifetimes of objects.
The description of why the default allocator is slow isn’t helpful; the author only suggests the standard memory manager “requires the operating system to do some management for us”, which is too vague to be useful! There is only inherently a performance issue with the “operating system” when an application in userspace makes many system calls to the kernel. Any allocator under the sun will attempt to service allocations by reusing memory that has already been allocated, which requires no involvement from the kernel. The performance of an allocator depends on its design and not if it was provided with the operating system or not.
This is not to say that all allocators are created equal - the budding
allocator-problem-haver might benefit from testing if tcmalloc
or mimalloc is faster than
the default allocator. On my computer I find that computing (fib 20)
drops from 1.24 seconds to 0.95 when I use tcmalloc. (In comparison,
CPython 3.11 and Ruby 3.0.6 take about 700 microseconds to compute
the same - the performance is much worse than even the slow
implementations of Python and Ruby, contrary to claims of the author.) But
the problem is really that the interpreter copies and thus allocates so
frequently to start with; using garbage collection rather than using pervasive
copying would require fewer calls to malloc
. That each lval
object
is bloated with data for every type it could possibly be despite only
being of one type also does not help.
Okay, back to the start. Where we’re being told about the lovely things we’ll make in this book. Apparently I am Ada Lovelace…or a fridge? No, wait, Mike Tyson. One of them. But the introduction sheds light on what this writeup is really about. (Emphasis mine.)
The type of Lisp we’ll be building is one I’ve invented for the purposes of this book. I’ve designed it for minimalism, simplicity and clarity, and I’ve become quite fond of it along the way. I hope you come to like it too. Conceptually, syntactically, and in implementation, this Lisp has a number of differences to other major brands of Lisp. So much so that I’m sure I will be getting e-mails from Lisp programmers telling me it isn’t a Lisp because it doesn’t do/have/look-like this or that.
I’ve not made this Lisp different to confuse beginners. I’ve made it different because different is good.
My problem is that I am just complaining about difference, and difference is good! Instead of fanboying over computer brands like Apple and Microsoft, I have been fanboying over language brands like Common Lisp or Scheme or Smalltalk the whole time. That all makes sense.
Suppose I’ll hold off on the e-mail then.
Another project of the author, the Cello C library, has a benchmark list. The garbage collection benchmark is horribly broken. The C “baseline” looks pretty straightforward; one function =malloc=s a few dozen objects, then frees it. But we note three immediate bugs:
- There are 35 variables
i00
throughi34
which hold allocated pointers. Buti33
is never freed. - The author used a
volatile int
to fool the C optimizer into not removing the malloc/free pairs, but it is trivial to see that there can only be one assignment to the variable, and so the compiler optimizes it out anyway. - There is a memory leak when
depth = 2
.
The end result is that the entire C program could be folded out, and so the benchmark measures startup time. The same thing happens with the JS and Java benchmarks, as they can also identify that no allocated objects are actually used.
Trying to get the C compiler to not fold out malloc/free was tricky, but here
is one possible approach. We declare extern void use(void* p);
in the benchmark
file, and then create a separate file with a definition of use
like
void use(void* p) { (void)p; }
. Then we compile the benchmark and use files
separately, and link (without link-time optimization, not that I know if it would
affect this test). The end result is that when compiling the benchmark, the compiler
is forced to actually allocate. On the other hand, one could well state that we incur
overhead of one extra function call per allocation.
When I run the new test, the program takes somewhere between 14 and 20 milliseconds to
run, rather than total noise between 0 and 1 milliseconds as before. So the polite thing
to do is to increase the work size. I increased the loop bound in main
from 100 to
10,000, and found it took 674 milliseconds to run. (Note that the provided Java benchmark
also runs 10,000 iterations rather than 100, and no scaling of the result times is presumably
done, which is quite concerning.)
We could use a JVM option to avoid
inlining the use
function with Java. However, the execution time does not seem to be affected
by it. (I’d love to hear if someone figures out a more rigorous way to test this stuff.)
In either case, the execution time lays somewhere between 90 and 125 milliseconds, which
is faster than the fixed C code. With this in mind, it is easy to believe the statement that
“that by manually managing memory Cello achieves performance much closer to C/C++” if we
note that C is pretty damn slow in this case.
I would suggest flicking through the slides for “How NOT to write a benchmark” by Dr Cliff Click on the matter of writing a micro-benchmark. One of the grave sins he mentions is just what we have been observing, that compilers will remove crucial parts of benchmarks if they are never used.
🥺🥺🥺