Skip to content

Instantly share code, notes, and snippets.

@revox
Created March 13, 2015 12:47
Show Gist options
  • Save revox/1af09dffe1f2ef81a24a to your computer and use it in GitHub Desktop.
Save revox/1af09dffe1f2ef81a24a to your computer and use it in GitHub Desktop.
Scrape electoral boundaries from mapitmysociety using BS4
# Scrapes KML files of UK parliamentary electoral units from mapitmysociety
# There are no files for Northern Island so their entries are empty
# Result is a CSV with name and geometry columns
import urllib, bs4, csv, sys, time
URL = 'http://mapit.mysociety.org/areas/WMC.html'
csvfile = open('electoral_units.csv', 'w')
csvwriter = csv.writer(csvfile)
webpage = urllib.urlopen(URL)
soup = bs4.BeautifulSoup(webpage)
seats = soup.select('li h3 a')
for seat in seats:
print seat.text.encode('utf-8')
name = seat.text.encode('utf-8')
kml_url = 'http://mapit.mysociety.org' + seat['href'].split('.')[0] + '.kml'
print kml_url
time.sleep(2) # intial runs missed pages if there is no delay
kmlpage = urllib.urlopen(kml_url).read()
kml_soup = bs4.BeautifulSoup(kmlpage,'xml')
if kml_soup:
print len(kml_soup)
csvwriter.writerow([name, kml_soup.Polygon])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment