Last active
June 9, 2024 13:54
-
-
Save DuaneR5280/e433ec746bea1c007e13d70b4449099f to your computer and use it in GitHub Desktop.
Extract Table from HTML element
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def table_extract(table): | |
""" | |
Extracts data from a table in an HTML document. | |
Args: | |
table (HTML element): The table element to extract data from. | |
Returns: | |
list of dict: A list of dictionaries, where each dictionary represents a row in the table. | |
The keys of the dictionaries are the headers of the table, and the values are the cell values. | |
""" | |
data = [] | |
headers = [header.text() for header in table.css("th")] | |
for row in table.css("tr"): | |
cells = row.css("td") | |
if len(cells) == len(headers): | |
row_data = {} | |
for i, cell in enumerate(cells): | |
row_data[headers[i]] = cell.text().strip() | |
data.append(row_data) | |
return data |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment