Skip to content

Instantly share code, notes, and snippets.

@plembo
Last active June 12, 2022 14:51
Show Gist options
  • Save plembo/d6e094e10e46c9c05091b4c7c695a945 to your computer and use it in GitHub Desktop.
Save plembo/d6e094e10e46c9c05091b4c7c695a945 to your computer and use it in GitHub Desktop.
Expand rows on a page with Selenium

Selenium to click open a table

A certain state goverment web app uses Ifragistics igGrid to publish tables to a page, with a button to reveal a second table. Because its javascript is executed after you visit the page, easy page scraping methods (like pandas read_html) won't work. Instead, you have to imitate a browser and the clicking of a button. Fortunately, the Selenium Python bindings allow you to do all that. With a little help from Beautiful Soup 4 and pandas, the rest is history. This code was developed in a Jupyter notebook as part of a larger effort to help ordinary citizens leverage public data for the public good.

Here's the code, with comments:

# Import Selenium and related libraries
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd

# The target URL
vt_url = "https://vt.ncsbe.gov/PetLkup/PetitionResult/?CountyID=0&PetitionName=NORTH%20CAROLINA%20GREEN%20PARTY"

# Instantiate the Selenium browser driver (in this case for Chromium) and invoke it on the target URL
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.implicitly_wait(5)
driver.get(vt_url)

# Find the button element and click it (the [Selenium IDE](https://www.selenium.dev/selenium-ide/) extension is a good tool for this)
driver.find_element(By.CSS_SELECTOR, ".ui-iggrid-expandbutton").click()

# Use Beautiful Soup to parse the page using the 'lxml' library
soup = BeautifulSoup(driver.page_source, 'lxml')

# Close the browser session (we don't need it any more)
driver.quit()

# Find all tables on the page
tables = soup.find_all('table')

# Create a datafrane from what you find
dfs = pd.read_html(str(tables))

# Get rid of an extraneous column (they know who they are)
dfs[1].drop('Counties', axis=1, inplace=True)

# Define the 2nd table on the page as a dataframe
co_table = dfs[1]

# Print the table
print(co_table)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment