Created
July 6, 2011 19:55
-
-
Save danking/1068185 to your computer and use it in GitHub Desktop.
A very simple example showing how to use Racket's lexing and parsing utilities
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#lang racket | |
(require parser-tools/lex | |
(prefix-in re- parser-tools/lex-sre) | |
parser-tools/yacc) | |
(provide (all-defined-out)) | |
(define-tokens a (NUM VAR)) | |
(define-empty-tokens b (+ - EOF LET IN)) | |
(define-lex-trans number | |
(syntax-rules () | |
((_ digit) | |
(re-: (re-? (re-or "-" "+")) (uinteger digit) | |
(re-? (re-: "." (re-? (uinteger digit)))))))) | |
(define-lex-trans uinteger | |
(syntax-rules () | |
((_ digit) (re-+ digit)))) | |
(define-lex-abbrevs | |
(digit10 (char-range "0" "9")) | |
(number10 (number digit10)) | |
(identifier-characters (re-or (char-range "A" "z") | |
"?" "!" ":" "$" "%" "^" "&")) | |
(identifier (re-+ identifier-characters))) | |
(define simple-math-lexer | |
(lexer | |
("-" (token--)) | |
("+" (token-+)) | |
("let" (token-LET)) | |
("in" (token-IN)) | |
((re-+ number10) (token-NUM (string->number lexeme))) | |
(identifier (token-VAR lexeme)) | |
;; recursively calls the lexer which effectively skips whitespace | |
(whitespace (simple-math-lexer input-port)) | |
((eof) (token-EOF)))) | |
(define-struct let-exp (var num exp)) | |
(define-struct arith-exp (op e1 e2)) | |
(define-struct num-exp (n)) | |
(define-struct var-exp (i)) | |
(define simple-math-parser | |
(parser | |
(start exp) | |
(end EOF) | |
(error void) | |
(tokens a b) | |
(precs (left - +)) | |
(grammar | |
(exp ((LET VAR NUM IN exp) | |
(make-let-exp $2 (num-exp $3) $5)) | |
((NUM) (num-exp $1)) | |
((VAR) (var-exp $1)) | |
((exp + exp) (make-arith-exp + $1 $3)) | |
((exp - exp) (make-arith-exp - $1 $3)))))) | |
(define (eval parsed-exp) | |
(match parsed-exp | |
((let-exp var num exp) | |
(eval (subst var num exp))) | |
((arith-exp op e1 e2) | |
(op (eval e1) | |
(eval e2))) | |
((num-exp n) n) | |
((var-exp i) (error 'eval "undefined identifier ~a" i)))) | |
(define (subst var num exp) | |
(match exp | |
((let-exp var2 num2 exp2) | |
(if (eq? var var2) | |
exp | |
(let-exp var2 num2 | |
(subst var num exp2)))) | |
((arith-exp op e1 e2) | |
(arith-exp op | |
(subst var num e1) | |
(subst var num e2))) | |
((var-exp id) | |
(if (equal? id var) | |
num | |
exp)) | |
((num-exp n) exp))) | |
(define (lex-this lexer input) (lambda () (lexer input))) | |
(let ((input (open-input-string "3 - 3.3 + 6"))) | |
(eval (simple-math-parser (lex-this simple-math-lexer input)))) | |
(let ((input (open-input-string "let foo 6 in 3 - 3.3 + foo"))) | |
(eval (simple-math-parser (lex-this simple-math-lexer input)))) |
Author
danking
commented
May 15, 2012
via email
On Tue, May 15, 2012 at 1:27 AM, legmar ***@***.*** wrote:
Awesome example! I have extended this grammar quite a bit on my own
project. One question though... any pointers on how to include a new token
such as WOR that could recognize strings for names of variables? (i.e. Just
like NUM matches numbers, I'm trying to find a way to get WOR to match words
such as "x" or "y".)
So far, I tried created a new token WOR and a char-range from "a" "z", but I
couldn't quite get it to work. I keep getting an error that I have an unbound
identifier when I try to do ((re-+ word10) (token-WOR (string->word
lexeme))). (Note, I am using word10 just to show consistency in my attempt to
copy the format used for decimal numbers on the alphabet token.)
Creating a new token is the correct first step. However, I would suggest naming
the new token VAR, VARIABLE, ID, or IDENTIFIER.
(define-token a (NUM VAR))
This new token _must_ _not_ be declared in the `define-empty-tokens' form
because you want to remember which _particular_ string was "lexed" into this
token (such as "x" or "y").
The unbound identifier error is probably the result of `string->word'. Racket
contains no procedure called`string->word'. The procedure I used,
`string->number', is contained in Racket, because "number" is a basic piece of
data which all Racket programs use. Racket does not know what a "word"
is. Therefore, it does not have any procedures which convert from strings to
words (i.e. it has no procedure called`string->word).
For the arithmetic evaluator that I wrote, I needed to convert from strings to
words because I wanted to add and subtract the numbers that were _represented_
in my input by strings such as "3", "3.3", and "6".
Strings _are_ an adequate representation of variables. The name of a variable
isn't added to the name of another variable. There is no need to use a
conversion procedure such as `string->word' or`string->var'.
We still need to declare what sorts of strings "lexemes" should be "lexed"
into a VAR token. The `define-lex-abbrevs' form is used to define shorter names
for more complicated regular expressions. Here, I've defined what characters are
valid parts of an identifier. I've also defined an identifier as one or more
valid identifier characters.
(define-lex-abbrevs
;; ...
(identifier-characters (re-or (char-range "A" "z")
"?" "!" ":" "$" "%" "^" "&"))
(identifier (re-+ identifier-characters)))
This definition of identifiers includes identifiers such as:
foo foo? foo! foo:bar apples&oranges ONEHUNDRED% ONEtwoThree ^^HEAD^^ $$$
Note that `define-lex-abbrevs' is how we define "lexer variables" and
`define-lex-trans' is how we define "lexer functions". Both `number' and
`uinteger' take arguments (in both cases this argument is called `digit'). On
the other hand, all of the`define-lex-abbrevs' declarations (i.e. digit10,
number10, identifier-characters, identifier) do not accept arguments, they're
just variables. I only suggest using `define-lex-trans' if you write many
similar looking`define-lex-abbrevs'.
Finally, we are now prepared to inform the lexer of our new tokens:
(lexer
;; ...
(identifier (token-VAR lexeme))
;; ...
)
In English, this says "When you (the Lexer) see something that matches what I've
previously declared to be an identifier (i.e a series of characters), I'd like
you to create a new token-VAR and set the contents of that token-VAR to the
series of characters (the lexeme) which you've read."
We must also inform the parser of VAR tokens:
(parser
(exp ((VAR) ...)
;; ...
))
What should we put on the right-hand side of the VAR production rule? Well. I'm
not sure because I'm not certain what kind of program you're trying to
write.
You'll notice that the gist contains lots of extra code and an `eval'
procedure. I believe this is the most straightforward way to implement
arithmetic expressions with variables. Which is what I think you're trying to
write.
However, arithmetic expressions with variables turn out to be a significant
increase in complexity over variable-less arithmetic expressions. I recommend
checking out Essentials of Programming Languages by Friedman, Wand, and
Haynes. Chapter three will guide you through a beefed-up version of what I've
written here. You might find you want to study chapters one and two first for
the background material.
Alternatively, I can try to explain this in more detail. In fact, I'm surprised
that I couldn't find any good blog posts about this topic. However, I need a
couple days to craft a really _good_ explanation rather than a quick and shallow
explanation.
##
Dan King
College of Computer and Information Science
Northeastern University
Thanks! That helps a lot! That was a very clear, helpful, concise, and awesome explanation! I'm travelling today, but I will post my project soon just for reference.
Hi! Thanks for your work. Very concise example :)
Thanks for the example here, Dan!
I just rediscovered this and realized it has thirty stars! This might be my highest impact piece of code 😉
Hi,
Thank you for your great and informative code. However, there's a simple mistake in line 70 of the code, you should change eq
to equal
in order to make string comparison work. (Example showing this mistake: let foo 5 in let foo 7 in 3 - 3.3 + foo
)
Still helping out newbies after so many years. Thanks for submitting this!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment