Last active
February 25, 2025 15:51
-
-
Save macloo/c9da309f401b1d0bf71367856ce76f31 to your computer and use it in GitHub Desktop.
Scraping a page - requires BeautifulSoup and Requests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from bs4 import BeautifulSoup | |
| import requests | |
| url = 'https://www.govtrack.us/congress/members/amy_klobuchar/412242' | |
| page = requests.get(url) | |
| soup = BeautifulSoup(page.text, 'html.parser') | |
| # get a list of all the h2 elements | |
| head_list = soup.find_all('h2') | |
| # loop over the list to find the heading where we start to scrape | |
| for hed in head_list: | |
| if hed.text == "Enacted Legislation": | |
| paragraph = hed.next_sibling.next_sibling | |
| # break out of loop when we find the one heading we want | |
| break | |
| # .next_sibling is a BeautifulSoup method for Tag objects - | |
| # https://www.crummy.com/software/BeautifulSoup/bs4/doc/ | |
| # this is the paragraph that comes after that h2 | |
| print(paragraph.text) | |
| # get all the li elements in the ul that comes after THAT paragraph | |
| leg_list = paragraph.next_sibling.next_sibling.find_all('li') | |
| for item in leg_list: | |
| print(item.text) | |
Author
Author
Tested without changes, 2/25/2021
Author
Tested without changes, 2/14/2022
Author
Tested and updated 2/25/25
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Tested and updated 2/13/2020