Skip to content

Instantly share code, notes, and snippets.

@macloo
Created April 1, 2019 13:23
Show Gist options
  • Select an option

  • Save macloo/26826eb5142b3d85ab75aeb55a56570e to your computer and use it in GitHub Desktop.

Select an option

Save macloo/26826eb5142b3d85ab75aeb55a56570e to your computer and use it in GitHub Desktop.
For Sarah April 2019
from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import csv
driver = webdriver.Chrome('/Users/mcadams/Documents/python/scraping2019/chromedriver')
driver.get('https://www.usa.gov/federal-agencies')
# pause because page is slow to load
time.sleep(5)
html = driver.page_source
bs = BeautifulSoup(html, "html5lib")
# close automated chrome
driver.quit()
# get all a elements and test by printing
letter_list = bs.find('ul', {'class':'az-list group'})
letter_urls = letter_list.find_all('a')
print(len(letter_urls))
print(letter_urls[0])
print(letter_urls[12])
@macloo
Copy link
Author

macloo commented Apr 1, 2019

I did not get the hrefs, which you need, but to do so you just loop over letter_urls and stick them in a list, or even a file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment