Created
May 25, 2012 19:03
-
-
Save alisey/2789897 to your computer and use it in GitHub Desktop.
Python port of regexp matcher from 'The Practice of Programming'
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Python port of Rob Pike's regexp matcher from 'The Practice of Programming' | |
# Idea for match() stolen from https://github.com/darius/sketchbook/blob/master/misc/pike.py | |
def match(re, text): | |
""" | |
Search for regexp anywhere in text | |
c - any literal character | |
. - any single character | |
^ - the beginning of the input string | |
$ - the end of the input string | |
* - zero or more occurrences of the previous character | |
""" | |
return (matchhere(re[1:], text) if re[0] == '^' else | |
matchstar('.', re, text)) | |
def matchhere(re, text): | |
"""Search for regexp at beginning of text""" | |
if re == '': | |
return True | |
if re == '$': | |
return text == '' | |
if len(re) > 1 and re[1] == '*': | |
return matchstar(re[0], re[2:], text) | |
if text and (re[0] == '.' or re[0] == text[0]): | |
return matchhere(re[1:], text[1:]) | |
return False | |
def matchstar(c, re, text): | |
"""Search for c*regexp at beginning of text""" | |
for i in range(len(text)) or [0]: | |
if matchhere(re, text[i:]): | |
return True | |
if text[i] != c and c != '.': | |
return False | |
return False |
The problem comes up with match('x*', '') returning None instead of True.
Thanks and I'm pleased to find you here -- you posted some great code for the class.
Empty text, damn. Thanks for pointing this out. If you don't mind I'll steal your idea to fix match().
This is only very slightly related, but would you like to review https://github.com/darius/peglet ? I'm trying to improve on the grammar parser from cs212 -- to make something more useful without making it less accessible.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Darius! The reason for Pike's do-while loop is to allow zero-width matches such as
match('aX*z', 'az')
. I too start with null match wheni=0
. And the last iteration ismatchhere(regexp, '')
. Or do I miss something?You're right about explicit return False. When writing production code I'm very strict about casting data to the correct type. But here I took the liberty for better effect.
Love your idea of
return matchstar('.', re, text)
.