Skip to content

Instantly share code, notes, and snippets.

@strikaco
Last active January 21, 2017 12:08
Show Gist options
  • Save strikaco/2d19cf767b50578488f7ea28736293be to your computer and use it in GitHub Desktop.
Save strikaco/2d19cf767b50578488f7ea28736293be to your computer and use it in GitHub Desktop.
Alternate implementation of NLTK's concordance() - no dependencies needed!
def concordance(string, search_term, width=25):
"""
Alternative implementation of NLTK's concordance() that
allows printing to stdout or saving to a variable and
does not require NLTK.
Just feed it a raw string, JSON string, etc. with any line
breaks stripped out.
"""
# Offset tracks our progress as we parse through the string
offset = 0
# Indexes lets us store all the positions we find your term in
indexes = []
# Keep scanning through the string until we reach the end
while offset < len(string):
try:
# From the current position to the end of the string, find
# the next potential position for your search term
position = string[offset:].lower().index(search_term.lower())
except ValueError:
# Your term wasn't found; exit.
break
if position:
# Your term was found. Add it to the list of indexes
indexes.append(position + offset)
# Now increase the offset to the position of your term,
# plus the length of its letters so we resume scanning
# after the end of it.
offset += position + len(search_term)
# For each position where the case was found, return the leading and
# trailing characters
return tuple(string[index-width:index+width+len(search_term)]
for index in indexes)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment