Blog 2020/1/9
<- previous | index | next ->
Let's learn how to write a Lisp interpreter in C!
In part 1, we implement a basic echoing REPL (read-eval-print-loop) which thinks everything is a symbol.
This first post is a bit heavy, because there's a lot of basic guts we have to set up just to get a working REPL.
repl()
reads one Lisp "form", evaluates it, prints the result, then loops.
A Lisp form is the smallest unit of syntax which results in a value. A Lisp form is an atom (a number, symbol, string, etc.) or a list. Of course, a list can contain other lists, so a form can be quite large!
We want our repl to be a bit smart about when it should display a prompt (>
).
$ ./lisp
>
It should prompt the user when run interactively,
$ ./lisp
> (+ 1 1)
2
> (+ 2 3)
5
>
but it shouldn't print any prompts if input is being piped in.
$ cat input.txt
(+ 1 1)
(+ 2 3)
$ cat input | ./lisp
2
5
$
Additionally, if multiple Lisp forms are entered on a single line, it shouldn't print another prompt until it runs out of forms to process.
$ ./lisp
> 1 2 3
1
2
3
>
Here is repl()
:
/* Starts a read-eval-print loop.
Loops until EOF or I/O error.
Returns 0 or error. */
int repl() {
FBuff* fbp;
int err = new_fbuff(&fbp, stdin);
if (err) {
return err;
}
while (true) {
fbuff_skip_buffered_ws(fbp);
if (should_display_prompt(fbp)) {
/* display the prompt. */
int err = fputs("> ", stdout);
if (err == EOF) {
free_fbuff(fbp);
err = errno;
errno = 0;
return err;
}
}
/* read the next Lisp form. */
Form* formp;
err = read_form(fbp, &formp);
if (err) {
if (err == EOF) {
break;
} else {
int err2 = fprintf(stderr, "Error %d.\n", err);
if (err2 < 0) {
free_fbuff(fbp);
return err2;
}
continue;
}
}
/* evaluate the form. */
Form* resultp;
err = eval_form(formp, &resultp);
if (err) {
int err2 = fprintf(stderr, "Error %d.\n", err);
if (err2 < 0) {
free_fbuff(fbp);
return err2;
}
continue;
}
/* print the result. */
err = print_form(resultp, stdout);
if (err) {
free_fbuff(fbp);
return err;
}
err = fputs("\n", stdout);
if (err == EOF) {
free_fbuff(fbp);
err = errno;
errno = 0;
return err;
}
/* loop. */
continue;
}
free_fbuff(fbp);
return 0;
}
Yuck, so verbose!
Because C doesn't support exceptions, the pedantic error handling ends up making the code very verbose.
Here's the same code but with error handling elided:
repl.c (error handling elided):
int repl() {
FBuff* fbp;
new_fbuff(&fbp, stdin);
while (true) {
fbuff_skip_buffered_ws(fbp);
if (should_display_prompt(fbp)) {
/* display the prompt. */
fputs("> ", stdout);
}
/* read the next Lisp form. */
Form* formp;
read_form(fbp, &formp);
/* evaluate the form. */
Form* resultp;
eval_form(formp, &resultp);
/* print the result. */
print_form(resultp, stdout);
fputs("\n", stdout);
/* loop. */
continue;
}
free_fbuff(fbp);
return 0;
}
Note: I'm using a bit of Hungarian notation: pointers have names which end in 'p'.
repl()
calls the core REPL functions:
read_form
eval_form
print_form
and some FBuff
functions:
new_fbuff
free_fbuff
and some minutiae related to getting the prompt behavior we described above:
should_display_prompt
fbuff_skip_buffered_ws
A Lisp "reader" reads enough input to parse the next Lisp "form".
This reader is very basic: it turns every token into a Lisp symbol.
reader.c (error handling elided):
/* Read one Lisp form from fp intp formpp.
Returns 0 or EOF or errno or error code. */
int read_form(FBuff* fbp, Form** formpp) {
int buffsize = 100;
char buff[buffsize];
char* buffp = buff;
char ch1;
/* read a token. */
fbuff_get_token(fbp, &buffp, buffsize);
ch1 = *buffp;
/* we've reached the end of input. */
if (ch1 == '\0') {
return EOF;
} else {
/* assume every token is a symbol. */
Symbol* symp;
new_symbol(&symp, buffp);
*formpp = (Form*)symp;
return 0;
}
}
The reader is built upon the tokenizer:
fbuff_get_token
and Symbol
creation:
new_symbol
_reader.c (error handling elided):
/* Advances fbp far enough to read one token of input.
Writes the token contents to *buffpp.
Returns 0, EOF, errno, or an error code. */
static int fbuff_get_token(FBuff* fbp, char** buffpp, size_t buffsize) {
char ch;
size_t bufflen = buffsize - 1;
char* cursor = *buffpp;
/* discard any leading whitespace. */
fbuff_discard_ws(fbp);
/* a token must be at least one char in length. */
fbuff_getch(fbp, &ch);
*cursor = ch;
cursor++;
/* the rest of the chars. */
while (true) {
fbuff_getch(fbp, &ch);
/* we've reached the end of this token. */
if (is_ch_delim(ch)) {
fbuff_ungetch(fbp, ch);
*cursor = '\0';
break;
/* this char is part of the token. */
} else {
*cursor = ch;
cursor++;
continue;
}
}
return 0;
}
The tokenizer is built upon character-oriented functions:
fbuff_getch
fbuff_ungetch
is_ch_delim
fbuff_discard_ws
FBuff
is a struct
and a set of functions
which read input in a line-oriented fashion,
but presents a character-oriented interface to the tokenizer.
/* A line-oriented FILE* buffer. */
struct FBuff_ {
FILE* fp;
char* buffp;
size_t size;
size_t len;
char* nextp;
};
typedef struct FBuff_ FBuff;
int new_fbuff(FBuff** fbpp, FILE* fp);
void free_fbuff(FBuff* fbp);
The line-oriented interface consists of fbuff_getline
:
reader.c (error handling elided):
/* Reads the next line into fbp.
Returns 0, EOF, or errno. */
static int fbuff_getline(FBuff* fbp) {
ssize_t result = getline(&(fbp->buffp), &(fbp->size), fbp->fp);
if (feof(fbp->fp)) {
return EOF;
} else {
fbp->len = result;
fbp->nextp = fbp->buffp;
return 0;
}
}
The character-oriented interface consists of fbuff_getch
and fbuff_ungetch
:
reader.c (error handling elided):
/* Reads and consumes the next character into chp from fbp.
Returns 0, EOF, or errno. */
static int fbuff_getch(FBuff* fbp, char* chp) {
if (is_fbuff_eol(fbp)) {
fbuff_getline(fbp);
}
char ch = *(fbp->nextp);
(fbp->nextp)++;
*chp = ch;
return 0;
}
/* Pushes ch back into fbp.
Asserts if used incorrectly. */
static void fbuff_ungetch(FBuff* fbp, char ch) {
assert(fbp->nextp > fbp->buffp);
fbp->nextp--;
*(fbp->nextp) = ch;
}
/* Is fbp at the end of the current line? */
bool is_fbuff_eol(FBuff* fbp) {
return fbp->len == 0 || fbp->nextp == fbp->buffp + fbp->len;
}
A Lisp evaluator determines the value of a Lisp form.
This evaluator is a simple no-op:
/* Evaluates formp into resultpp.
Returns 0. */
int eval_form(Form* formp, Form** resultpp) {
/* for now, all forms evaluate to themselves. */
if (is_symbol(formp)) {
*resultpp = formp;
return 0;
/* unsupported form. */
} else {
assert(false);
}
}
A Lisp printer prints a canonical representation of a Lisp form.
This printer only knows how to print Lisp symbols.
For now, the printer will also indicate the type of the form (e.g. Symbol: foo
), which will allow us to peek at the internal state of our interpreter and verify its behavior.
printer.c (error-handling elided):
/* Prints the Form in formp into fp.
Returns 0 or errno. */
int print_form(Form* formp, FILE* fp) {
if (is_symbol(formp)) {
Symbol* symp = (Symbol*)formp;
return print_symbol(symp, fp);
} else {
assert(false);
}
}
/* Prints the Symbol in symp into fp.
Returns 0 or errno. */
static int print_symbol(Symbol* symp, FILE* fp) {
fprintf(fp, "Symbol: %s", symp->valuep);
return 0;
}
A Lisp symbol is represented as struct
:
struct Symbol_ {
FormType type;
char* valuep;
};
typedef struct Symbol_ Symbol;
int new_symbol(Symbol** sympp, const char* sp);
bool is_symbol(Form* formp);
Additionally, we can cast Lisp forms to a type-erased representation:
/* A type-erased Lisp form. */
struct Form_ {
FormType type;
};
typedef struct Form_ Form;
We use an enum
to keep track of the Lisp for types:
/* The list of Lisp form types. */
enum FormType_ {
_Type_UNINITIALIZED = 0,
TypeSymbol = 10,
};
typedef enum FormType_ FormType;
Give it a try! Click the "Download Zip" button at the top-right of this page.
Build the interpreter using make
:
$ make lisp
Fire it up:
$ ./lisp
> hello world
Symbol: hello
Symbol: world
> (+ 1 1)
Symbol: (+
Symbol: 1
Symbol: 1)
>
As you can see, literally everything is considered a symbol.
You can also pipe into it:
$ echo 1 2 3 | ./lisp
Symbol: 1
Symbol: 2
Symbol: 3
There are a few other commands implemented in the Makefile
:
$ make test
$ make clean
I've run the tests on macOS, linux, and win7/cygwin.
In part 2 we will add support for C long
integers!
Thanks for taking a look! My
getline
man page saysssize_t
, maybe on your platform it issize_t
?