Skip to content

Instantly share code, notes, and snippets.

@cageyjames
Created December 24, 2012 20:14
Show Gist options
  • Select an option

  • Save cageyjames/4370576 to your computer and use it in GitHub Desktop.

Select an option

Save cageyjames/4370576 to your computer and use it in GitHub Desktop.
Get all the URLs for my blog (http://www.spatiallyadjusted.com)
#!/usr/bin/env python
from bs4 import BeautifulSoup
import urllib2
url = "http://www.spatiallyadjusted.com/blog/archives/"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
file = open('out.txt', 'w')
for link in soup.find_all('a'):
# print(link.get('href'))
file.write("%s\n" % link.get('href'))
file.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment