Skip to content

Instantly share code, notes, and snippets.

@cameronp98
Last active January 3, 2016 18:49
Show Gist options
  • Select an option

  • Save cameronp98/8504639 to your computer and use it in GitHub Desktop.

Select an option

Save cameronp98/8504639 to your computer and use it in GitHub Desktop.
Simple lexer in python
from collections import namedtuple
import re
# basic token container
Token = namedtuple("Token", ["tag", "val", "pos", "end"])
# regex scanner handler
t = lambda tag: lambda sc, val: Token(tag, val, sc.match.start(), sc.match.end())
def lex(text, rules, ignore_whitespace=True):
handlers = [(reg, t(tag)) for (reg,tag) in rules.items()]
if ignore_whitespace:
handlers.append((r"\s+", None))
toks, rem = re.Scanner(handlers).scan(text)
return toks, rem
if __name__ == '__main__':
string = "these are some words"
rules = {r"[a-z]+": "WORD"}
toks = lex(string, rules)
for token in toks:
print(token)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment