Skip to content

Instantly share code, notes, and snippets.

@icheishvili
Created March 12, 2012 21:48
Show Gist options
  • Save icheishvili/2024914 to your computer and use it in GitHub Desktop.
Save icheishvili/2024914 to your computer and use it in GitHub Desktop.
Faster automata-style string-matching
#!/usr/bin/env python
def match(needle, haystack):
"""Perform finite-automata style string matching."""
matches = []
nposition = 0
for index, char in enumerate(haystack):
if needle[nposition] == char:
nposition += 1
if nposition == len(needle):
matches.append((index - len(needle) + 1, index + 1))
nposition = 0
elif needle[0] == char:
nposition = 1
else:
nposition = 0
return matches
def main():
"""Run the example program."""
haystack = '''
In computer science, the Boyer-Moore string search algorithm is a
particularly efficient string searching algorithm, and it has been
the standard benchmark for the practical string search literature.
It was developed by Bob Boyer and J Strother Moore in 1977. The
algorithm preprocesses the target string (key) that is being
searched for, but not the string being searched in (unlike some
algorithms that preprocess the string to be searched and can then
amortize the expense of the preprocessing by searching repeatedly).
The execution time of the Boyer-Moore algorithm, while still linear
in the size of the string being searched, can have a significantly
lower constant factor than many other search algorithms: it doesn't
need to check every character of the string to be searched, but
rather skips over some of them. Generally the algorithm gets faster
as the key being searched for becomes longer. Its efficiency derives
from the fact that with each unsuccessful attempt to find a match
between the search string and the text it is searching, it uses the
information gained from that attempt to rule out as many positions
of the text as possible where the string cannot match.
'''
needle = 'Boyer'
matches = match(needle, haystack)
for match_start, match_end in matches:
print haystack[match_start:match_end]
if __name__ == '__main__':
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment