A certain state goverment web app uses Ifragistics igGrid to publish tables to a page, with a button to reveal a second table. Because its javascript is executed after you visit the page, easy page scraping methods (like pandas read_html) won't work. Instead, you have to imitate a browser and the clicking of a button. Fortunately, the Selenium Python bindings allow you to do all that. With a little help from Beautiful Soup 4 and pandas, the rest is history. This code was developed in a Jupyter notebook as part of a larger effort to help ordinary citizens leverage public data for the public good.
Here's the code, with comments:
# Import Selenium and related libraries
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd
# The target URL
vt_url = "https://vt.ncsbe.gov/PetLkup/PetitionResult/?CountyID=0&PetitionName=NORTH%20CAROLINA%20GREEN%20PARTY"
# Instantiate the Selenium browser driver (in this case for Chromium) and invoke it on the target URL
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.implicitly_wait(5)
driver.get(vt_url)
# Find the button element and click it (the [Selenium IDE](https://www.selenium.dev/selenium-ide/) extension is a good tool for this)
driver.find_element(By.CSS_SELECTOR, ".ui-iggrid-expandbutton").click()
# Use Beautiful Soup to parse the page using the 'lxml' library
soup = BeautifulSoup(driver.page_source, 'lxml')
# Close the browser session (we don't need it any more)
driver.quit()
# Find all tables on the page
tables = soup.find_all('table')
# Create a datafrane from what you find
dfs = pd.read_html(str(tables))
# Get rid of an extraneous column (they know who they are)
dfs[1].drop('Counties', axis=1, inplace=True)
# Define the 2nd table on the page as a dataframe
co_table = dfs[1]
# Print the table
print(co_table)