Skip to content

Instantly share code, notes, and snippets.

@JamieMagee
Last active August 29, 2015 14:00
Show Gist options
  • Save JamieMagee/11117985 to your computer and use it in GitHub Desktop.
Save JamieMagee/11117985 to your computer and use it in GitHub Desktop.
Scrape Cambridge College scarves from Wikipedia
from bs4 import BeautifulSoup
import urllib.request as request
import re
url = 'https://en.wikipedia.org/wiki/Colleges_of_the_University_of_Cambridge'
page = request.urlopen(url).read().decode('utf-8')
soup = BeautifulSoup(page)
tables = soup.find_all('table', {'class': 'toccolours'})
f = open('scarves.html', 'wb')
reg = re.compile('font-size:50%;')
for table in tables:
del table['style']
f.write(re.sub(reg, '', table.prettify()).encode('utf-8'))
f.write('''
<style>
table{height:11px;width:6%!important;}
</style>
'''.encode('utf-8'))
f.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment