Created
April 1, 2019 13:23
-
-
Save macloo/26826eb5142b3d85ab75aeb55a56570e to your computer and use it in GitHub Desktop.
For Sarah April 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from urllib.request import urlopen | |
| from bs4 import BeautifulSoup | |
| from selenium import webdriver | |
| import time | |
| import csv | |
| driver = webdriver.Chrome('/Users/mcadams/Documents/python/scraping2019/chromedriver') | |
| driver.get('https://www.usa.gov/federal-agencies') | |
| # pause because page is slow to load | |
| time.sleep(5) | |
| html = driver.page_source | |
| bs = BeautifulSoup(html, "html5lib") | |
| # close automated chrome | |
| driver.quit() | |
| # get all a elements and test by printing | |
| letter_list = bs.find('ul', {'class':'az-list group'}) | |
| letter_urls = letter_list.find_all('a') | |
| print(len(letter_urls)) | |
| print(letter_urls[0]) | |
| print(letter_urls[12]) |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I did not get the hrefs, which you need, but to do so you just loop over letter_urls and stick them in a list, or even a file.