Skip to content

Instantly share code, notes, and snippets.

@alisey
Created May 25, 2012 19:03
Show Gist options
  • Save alisey/2789897 to your computer and use it in GitHub Desktop.
Save alisey/2789897 to your computer and use it in GitHub Desktop.
Python port of regexp matcher from 'The Practice of Programming'
# Python port of Rob Pike's regexp matcher from 'The Practice of Programming'
# Idea for match() stolen from https://github.com/darius/sketchbook/blob/master/misc/pike.py
def match(re, text):
"""
Search for regexp anywhere in text
c - any literal character
. - any single character
^ - the beginning of the input string
$ - the end of the input string
* - zero or more occurrences of the previous character
"""
return (matchhere(re[1:], text) if re[0] == '^' else
matchstar('.', re, text))
def matchhere(re, text):
"""Search for regexp at beginning of text"""
if re == '':
return True
if re == '$':
return text == ''
if len(re) > 1 and re[1] == '*':
return matchstar(re[0], re[2:], text)
if text and (re[0] == '.' or re[0] == text[0]):
return matchhere(re[1:], text[1:])
return False
def matchstar(c, re, text):
"""Search for c*regexp at beginning of text"""
for i in range(len(text)) or [0]:
if matchhere(re, text[i:]):
return True
if text[i] != c and c != '.':
return False
return False
@alisey
Copy link
Author

alisey commented Jun 6, 2012

Hi Darius! The reason for Pike's do-while loop is to allow zero-width matches such as match('aX*z', 'az'). I too start with null match when i=0. And the last iteration is matchhere(regexp, ''). Or do I miss something?

You're right about explicit return False. When writing production code I'm very strict about casting data to the correct type. But here I took the liberty for better effect.

Love your idea of return matchstar('.', re, text).

@darius
Copy link

darius commented Jun 6, 2012

The problem comes up with match('x*', '') returning None instead of True.

Thanks and I'm pleased to find you here -- you posted some great code for the class.

@alisey
Copy link
Author

alisey commented Jun 7, 2012

Empty text, damn. Thanks for pointing this out. If you don't mind I'll steal your idea to fix match().

@darius
Copy link

darius commented Nov 21, 2012

This is only very slightly related, but would you like to review https://github.com/darius/peglet ? I'm trying to improve on the grammar parser from cs212 -- to make something more useful without making it less accessible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment