Skip to content

Instantly share code, notes, and snippets.

@leandrotoledo
Created February 22, 2025 15:07
Show Gist options
  • Save leandrotoledo/5196d29711e1ae431d77b816b71399b2 to your computer and use it in GitHub Desktop.
Save leandrotoledo/5196d29711e1ae431d77b816b71399b2 to your computer and use it in GitHub Desktop.
This script extracts tables from LCSO Reports (PDFs) and formats them as Markdown for Telegram messages (using <pre> tags for proper rendering).
import pymupdf
def generate_markdown_table(rows):
"""Generates a Markdown table string wrapped in <pre> tags."""
if not rows:
return "No incidents reported"
headers = ["Station", "Date / Time", "Location", "Incident"]
# Build the Markdown table string
table_md = "<pre>\n"
table_md += "|" + "|".join(headers) + "|\n"
table_md += "|" + "|".join("-" * len(header) for header in headers) + "|\n"
for row in rows:
table_md += "|" + "|".join(str(cell).replace("\n", " ") for cell in row) + "|\n"
table_md += "</pre>"
return table_md
def process_pdf(pdf_path):
"""Extracts data from the PDF and returns a list of rows."""
doc = pymupdf.open(pdf_path)
rows = []
for page in doc:
tabs = page.find_tables()
if tabs.tables:
for tab in tabs:
table_data = tab.extract()
if table_data:
station_info = table_data[0][0].split("\n")[0]
data = table_data[2:]
for row in data:
# Add station info to each row
row.insert(0, station_info)
rows.append(row)
return rows
if __name__ == "__main__":
pdf_path = "LCSO Report - 2025-02-18.pdf"
rows = process_pdf(pdf_path)
markdown_table = generate_markdown_table(rows)
print(markdown_table)
print()
pdf_path = "LCSO Report - 2025-02-19.pdf"
rows = process_pdf(pdf_path)
markdown_table = generate_markdown_table(rows)
print(markdown_table)
@leandrotoledo
Copy link
Author

Output Example:

Station Date / Time Location Incident
Eastern Loudoun Station 02/18/2025 5:00 p.m. 46000 block Woodstone Ter., Sterling Vehicle Tampering: Complainant reported that her vehicle was damaged. SO250002847
Ashburn Station 02/10/2025- 02/17/2025 12:00 a.m. 43000 block Postrail Sq., Ashburn Vehicle Tampering: Complainant reported that her vehicle was damaged due to being scratched on all sides. SO250002877
Ashburn Station 02/18/2025 7:30 a.m. 20000 block Trails End Ter., Ashburn Larceny: Complainant reported that his vehicle was entered, and items were taken. SO250002842
Dulles South Station No significant incidents
Western Loudoun Station No significant incidents
Scams No significant incidents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment