Skip to content

Instantly share code, notes, and snippets.

@WillKoehrsen
Created September 23, 2018 14:32
Show Gist options
  • Save WillKoehrsen/3c5f4457724566cf4c9eb40928b44150 to your computer and use it in GitHub Desktop.
Save WillKoehrsen/3c5f4457724566cf4c9eb40928b44150 to your computer and use it in GitHub Desktop.
# Object for handling xml
handler = WikiXmlHandler()
# Parsing object
parser = xml.sax.make_parser()
parser.setContentHandler(handler)
start = timer()
# Parse the entire file
for line in subprocess.Popen(['bzcat'],
stdin = open(data_path),
stdout = subprocess.PIPE).stdout:
try:
parser.feed(line)
except StopIteration:
break
end = timer()
books = handler._books
print(f'\nSearched through {handler._article_count} articles.')
print(f'\nFound {len(books)} books in {round(end - start)} seconds.')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment