Created
May 25, 2012 19:03
-
-
Save alisey/2789897 to your computer and use it in GitHub Desktop.
Python port of regexp matcher from 'The Practice of Programming'
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Python port of Rob Pike's regexp matcher from 'The Practice of Programming' | |
# Idea for match() stolen from https://github.com/darius/sketchbook/blob/master/misc/pike.py | |
def match(re, text): | |
""" | |
Search for regexp anywhere in text | |
c - any literal character | |
. - any single character | |
^ - the beginning of the input string | |
$ - the end of the input string | |
* - zero or more occurrences of the previous character | |
""" | |
return (matchhere(re[1:], text) if re[0] == '^' else | |
matchstar('.', re, text)) | |
def matchhere(re, text): | |
"""Search for regexp at beginning of text""" | |
if re == '': | |
return True | |
if re == '$': | |
return text == '' | |
if len(re) > 1 and re[1] == '*': | |
return matchstar(re[0], re[2:], text) | |
if text and (re[0] == '.' or re[0] == text[0]): | |
return matchhere(re[1:], text[1:]) | |
return False | |
def matchstar(c, re, text): | |
"""Search for c*regexp at beginning of text""" | |
for i in range(len(text)) or [0]: | |
if matchhere(re, text[i:]): | |
return True | |
if text[i] != c and c != '.': | |
return False | |
return False |
This is only very slightly related, but would you like to review https://github.com/darius/peglet ? I'm trying to improve on the grammar parser from cs212 -- to make something more useful without making it less accessible.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Empty text, damn. Thanks for pointing this out. If you don't mind I'll steal your idea to fix match().