This is a small experiment to see whether one can:
- Lex a file efficiently, retaining line/column and indentation information.
- Consuming no or little memory (aside from the input size itself), and
- Still have the flexibility to perform zero-cost operations like folds (counting tokens), doing nothing (a simple pass), or printing. SAX-style.
This proves that one could, e.g., run in ST and write to a mutable Storable vector. Allowing the caller to process the set of tokens later. But the cost/calculation of figuring out line/col/indentation of each token has already been figured out.
The input file is war-and-peace.txt which is 6MB. Simply reading the file takes 27ms. Counting all words (non-space) in the file takes 36ms. So let's say about 9ms, in the absense of more rigorous gauge-based benchmarking. There are 1,132,619 "words" in the file.