-
-
Save alisey/2789897 to your computer and use it in GitHub Desktop.
# Python port of Rob Pike's regexp matcher from 'The Practice of Programming' | |
# Idea for match() stolen from https://github.com/darius/sketchbook/blob/master/misc/pike.py | |
def match(re, text): | |
""" | |
Search for regexp anywhere in text | |
c - any literal character | |
. - any single character | |
^ - the beginning of the input string | |
$ - the end of the input string | |
* - zero or more occurrences of the previous character | |
""" | |
return (matchhere(re[1:], text) if re[0] == '^' else | |
matchstar('.', re, text)) | |
def matchhere(re, text): | |
"""Search for regexp at beginning of text""" | |
if re == '': | |
return True | |
if re == '$': | |
return text == '' | |
if len(re) > 1 and re[1] == '*': | |
return matchstar(re[0], re[2:], text) | |
if text and (re[0] == '.' or re[0] == text[0]): | |
return matchhere(re[1:], text[1:]) | |
return False | |
def matchstar(c, re, text): | |
"""Search for c*regexp at beginning of text""" | |
for i in range(len(text)) or [0]: | |
if matchhere(re, text[i:]): | |
return True | |
if text[i] != c and c != '.': | |
return False | |
return False |
Hi Darius! The reason for Pike's do-while loop is to allow zero-width matches such as match('aX*z', 'az')
. I too start with null match when i=0
. And the last iteration is matchhere(regexp, '')
. Or do I miss something?
You're right about explicit return False. When writing production code I'm very strict about casting data to the correct type. But here I took the liberty for better effect.
Love your idea of return matchstar('.', re, text)
.
The problem comes up with match('x*', '') returning None instead of True.
Thanks and I'm pleased to find you here -- you posted some great code for the class.
Empty text, damn. Thanks for pointing this out. If you don't mind I'll steal your idea to fix match().
This is only very slightly related, but would you like to review https://github.com/darius/peglet ? I'm trying to improve on the grammar parser from cs212 -- to make something more useful without making it less accessible.
I like how you turned matchstar into a for loop -- nice and Pythonic! But if it reaches the end doesn't it need a final "return matchhere(regexp, '')"? Pike's version avoids that by putting the loop test at the bottom.
Same issue with match(). I'd use explicit "return False" instead of implicit "return None" generally for this sort of thing -- I noticed the issue above because of thinking of that change and then going "wait, is false the right answer there?"
I'll post my version (which for some reason I hadn't committed) at https://github.com/darius/sketchbook/tree/master/misc