-
-
Save meijeru/1046347 to your computer and use it in GitHub Desktop.
| REBOL [ | |
| Title: "Red/System lexical analysis" | |
| Date: 1-Jul-2011 | |
| Name: "Reds lexer" | |
| Type: none | |
| Version: 1.0.0 | |
| File: %/G/Projects/Common/RED/red-system/sources/reds-lexer/reds-lexer.r | |
| Home: http://users.telenet.be/rwmeijer | |
| Author: "Rudolf W. Meijer" | |
| Rights: "Copyright (C) 2011 Rudolf W. Meijer. All Rights Reserved" | |
| History: [ | |
| 0.0.0 [19-Jun-2011 {Start of project} "RM"] | |
| 0.5.0 [24-Jun-2011 {First working version} "RM"] | |
| 0.7.0 [27-Jun-2011 {Added file! and tuple! literals} "RM"] | |
| 0.8.0 [27-Jun-2011 {Simplified the separator} "RM"] | |
| 0.9.0 [29-Jun-2011 { | |
| Separator reduced to stripping comments only, | |
| Grammar takes care of whitespace | |
| } "RM" ] | |
| 1.0.0 [1-Jul-2011 {Grammar takes care of comments also} "RM"] | |
| ] | |
| ] | |
| ;---|----1----|----2----|----3----|----4----|----5----|----6----|----7----|- | |
| do %reds-lex-grammar.r | |
| reds-lexer: func [ | |
| inp [file! url! binary! string!] | |
| ][ | |
| unless string? inp [ | |
| unless binary? inp [inp: read/binary inp] | |
| inp: to-string inp | |
| ] | |
| either empty? inp | |
| [ | |
| print "nothing to analyse: empty input" | |
| ][ | |
| print "start" | |
| ; diagnostic | |
| ["parse" dt [ | |
| parse/all inp lex-grammar/program | |
| ]] | |
| ; parsed source is in lex-grammar/source | |
| ] | |
| ] | |
| test-text: | |
| %../../tests/source/units/exit-test.reds | |
| print "call" | |
| reds-lexer | |
| ;copy | |
| test-text | |
| ; diagnostic | |
| print mold/all head lex-grammar/source | |
| ask "" | |
| halt |
Lexer and grammar files downloaded, having just a quick look now, will do all the testing tomorrow. Hope it could be used as a drop-in replacement to LOAD (or at least would not require too much work to do so). It looks very exciting anyway. :-)
I realize it is not geared to #include and #define, so that part will have to be (re-)done anyway.
A few comments from my first review session:
-
It is sometime up to 50x slower than
LOAD(using%tests/source/units/auto-tests/byte-auto-test.redsfor example, I get 11ms with LOAD and 504ms withreds-lexer). It seems that a lot of time is spent indecode-string, could it be rewritten usingparserules instead? This is not a big issue at this point, it just shows how slow REBOL is... -
Using
%tests/source/units/auto-tests/integer-auto-test.redsas input, I get the following error:** Math Error: Math or number overflow ** Where: store-integer ** Near: pow: pow * 10 -
The previous point makes me realize that
reds-lexeris currently lacking error catching support, including proper reporting of the input position where the scanning failed (required to be able to generate accurate syntax error messages in Red/System). This point it really important for using it as a LOAD replacement. -
It will require significant work (maybe a day or two) to integrate
reds-lexerin compiler. Mainly re-wiring properly, removing or replacing deferred syntax checking in compiler and extensive testing/fixing for regressions. -
Related to this work, I would be very interested in a formal Red/System syntax grammar specification (ideally using BNF format). Let me know if you are interested.
I am still interested in replacing LOAD with reds-lexer, but I guess it is not doable before going beta (announce planned for tomorrow). I guess that reds-lexer integration could be achieved probably later this week or next week.
Thank you for your nice work!
I do have a Red/System syntax grammar ready, as a Word file (BNF productions - albeit with ambiguity - and semantic comments). I will send it to your [email protected] address shortly.
This grammar does not consider the shift operators that were added just today.
I have now managed to eliminate even the comment-stripping: REBOL's parse is just more powerful than I suspected. So the grammar takes care of everything. See version 1.0.0 of 1-Jul.