Skip to content

Instantly share code, notes, and snippets.

@alexstorer
Created November 30, 2012 19:37
Show Gist options
  • Save alexstorer/4178020 to your computer and use it in GitHub Desktop.
Save alexstorer/4178020 to your computer and use it in GitHub Desktop.
How to scrape text for regular expressions
import re
import csv
import glob
fc = open('signers.csv','w')
c = csv.DictWriter(fc,["Name","Chamber","Number","Year Filename"])
c.writeheader()
# I used the pdftotext utility to convert the pdf documents
# Look here for details: http://www.bluem.net/en/mac/packages/
allnames = glob.glob('*.txt')
for fname in allnames:
f = open(fname)
for l in f:
m = re.match("([0-9]+\.)?(.*)\(([A-Z])-([0-9]*)\)",l)
if m:
record = dict()
record["Name"] = m.group(2).strip()
record["Chamber"] = m.group(3)
record["Number"] = m.group(4)
record["Year Filename"] = fname
c.writerow(record)
f.close()
fc.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment