Skip to content

Instantly share code, notes, and snippets.

@macloo
Last active February 25, 2025 15:51
Show Gist options
  • Select an option

  • Save macloo/c9da309f401b1d0bf71367856ce76f31 to your computer and use it in GitHub Desktop.

Select an option

Save macloo/c9da309f401b1d0bf71367856ce76f31 to your computer and use it in GitHub Desktop.
Scraping a page - requires BeautifulSoup and Requests
from bs4 import BeautifulSoup
import requests
url = 'https://www.govtrack.us/congress/members/amy_klobuchar/412242'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
# get a list of all the h2 elements
head_list = soup.find_all('h2')
# loop over the list to find the heading where we start to scrape
for hed in head_list:
if hed.text == "Enacted Legislation":
paragraph = hed.next_sibling.next_sibling
# break out of loop when we find the one heading we want
break
# .next_sibling is a BeautifulSoup method for Tag objects -
# https://www.crummy.com/software/BeautifulSoup/bs4/doc/
# this is the paragraph that comes after that h2
print(paragraph.text)
# get all the li elements in the ul that comes after THAT paragraph
leg_list = paragraph.next_sibling.next_sibling.find_all('li')
for item in leg_list:
print(item.text)
@macloo
Copy link
Author

macloo commented Feb 13, 2020

Tested and updated 2/13/2020

@macloo
Copy link
Author

macloo commented Feb 25, 2021

Tested without changes, 2/25/2021

@macloo
Copy link
Author

macloo commented Feb 14, 2022

Tested without changes, 2/14/2022

@macloo
Copy link
Author

macloo commented Feb 25, 2025

Tested and updated 2/25/25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment