Skip to content

Instantly share code, notes, and snippets.

@ameerkat
Created October 14, 2010 17:46
Show Gist options
  • Save ameerkat/626643 to your computer and use it in GitHub Desktop.
Save ameerkat/626643 to your computer and use it in GitHub Desktop.
Boyer Moore string search implementation in Python
# Boyer Moore String Search implementation in Python
# Ameer Ayoub <[email protected]>
# Generate the Bad Character Skip List
def generateBadCharShift(term):
skipList = {}
for i in range(0, len(term)-1):
skipList[term[i]] = len(term)-i-1
return skipList
# Generate the Good Suffix Skip List
def findSuffixPosition(badchar, suffix, full_term):
for offset in range(1, len(full_term)+1)[::-1]:
flag = True
for suffix_index in range(0, len(suffix)):
term_index = offset-len(suffix)-1+suffix_index
if term_index < 0 or suffix[suffix_index] == full_term[term_index]:
pass
else:
flag = False
term_index = offset-len(suffix)-1
if flag and (term_index <= 0 or full_term[term_index-1] != badchar):
return len(full_term)-offset+1
def generateSuffixShift(key):
skipList = {}
buffer = ""
for i in range(0, len(key)):
skipList[len(buffer)] = findSuffixPosition(key[len(key)-1-i], buffer, key)
buffer = key[len(key)-1-i] + buffer
return skipList
# Actual Search Algorithm
def BMSearch(haystack, needle):
goodSuffix = generateSuffixShift(needle)
badChar = generateBadCharShift(needle)
i = 0
while i < len(haystack)-len(needle)+1:
j = len(needle)
while j > 0 and needle[j-1] == haystack[i+j-1]:
j -= 1
if j > 0:
badCharShift = badChar.get(haystack[i+j-1], len(needle))
goodSuffixShift = goodSuffix[len(needle)-j]
if badCharShift > goodSuffixShift:
i += badCharShift
else:
i += goodSuffixShift
else:
return i
return -1
if __name__ == "__main__":
block = "This is a simple example"
print "This is an example search on the string \"", block, "\"."
print "ple :", BMSearch(block, "ple ")
print "example :", BMSearch(block, "example")
print "simple :", BMSearch(block, "simple")
print " imple :", BMSearch(block, " imple")
@Abusagit
Copy link

Abusagit commented Mar 12, 2021

Hello, I have recently started to deploy my efforts in the pattern recognition problem field and wondered if I could implement not only original Boyer-Moore-Horspool algorith, but its tubo-verion. There were some unsuccessful tries because of good suffix heuristic too. Hell yeah, it`s much more complicated to understand.

Thanks to these resources:
Turbo-BMH
Great explanation
Explanation too

Despite all work has been done before me and can be seen on those articles, it was quite hard to understand how it works even with step-by-step revision of code and schemes though.

I used bad character heuristic without internal function in my code, good character heuristic was implemented by 2 consequent functions which calculate length of suffix when pattern is partially recognised, and then - (really wanted to blame myself in time consuming, it made me spend an evening in learning it) - turbo-shift heuristic. I didn`t find anything related to this topic written on Python, nevertheless, it was easy to synthetize a code by watching websites I mentioned.

Thus, I got it in my gist

So, It passed tests and I`m glad for now, if you are interested and it can help you or someone else, it will be very nice

@dhritix1999
Copy link

@Abusagit thank you so much for the reply, the links are really good, Thank you for providing us with your implementation also and hopefully this will help others too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment