Skip to content

Instantly share code, notes, and snippets.

@yamini
Last active January 18, 2025 00:06
Show Gist options
  • Save yamini/60891059b939ec82922921a1f4508734 to your computer and use it in GitHub Desktop.
Save yamini/60891059b939ec82922921a1f4508734 to your computer and use it in GitHub Desktop.
table_to_reportlab.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/yamini/60891059b939ec82922921a1f4508734/table_to_reportlab.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# ๐Ÿงพ Generate PDF Tax Forms from Synthetic Data\n",
"\n",
"Welcome to the **PDF Tax Form Generator** notebook! ๐Ÿ“„\n",
"\n",
"This notebook will guide you through the process of:\n",
"1. Generating **synthetic tax data** for individuals. ๐Ÿ› ๏ธ\n",
"2. Previewing the generated data in a pandas DataFrame. ๐Ÿ‘€\n",
"3. Converting each individual's data into a **custom PDF form** using [ReportLab](https://www.reportlab.com/). โœ๏ธ\n",
"4. Providing download links for the generated PDF files. ๐Ÿ’พ\n",
"\n",
"---\n",
"\n",
"## Why ReportLab? ๐Ÿค”\n",
"\n",
"Weโ€™re using **ReportLab**, an open source Python library for creating PDF documents, because:\n",
"- It gives **precise control** over the layout and formatting of PDFs.\n",
"- Itโ€™s **flexible**, allowing us to add tables, images, and other styled elements to the forms.\n",
"- It integrates seamlessly with Python and pandas, making it ideal for generating PDFs from synthetic data.\n",
"\n",
"**ReportLab** is open-source and free for developers under the **BSD License**, which allows:\n",
"- Commercial use.\n",
"- Modification and redistribution of the library.\n",
"\n",
"---\n",
"\n",
"Let's get started! ๐Ÿš€"
],
"metadata": {
"id": "v_wu2gtMc2v0"
}
},
{
"cell_type": "markdown",
"source": [
"# Installation and setup"
],
"metadata": {
"id": "uJeDTs1juwAT"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "t7KGp0k6cNAe"
},
"outputs": [],
"source": [
"# Step 1: Install Required Libraries\n",
"!pip install reportlab pandas"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "iDaIcdaycNAf"
},
"outputs": [],
"source": [
"# Step 2: Import Required Libraries\n",
"from reportlab.platypus import Table, TableStyle, SimpleDocTemplate\n",
"from reportlab.lib.pagesizes import letter\n",
"from reportlab.lib import colors\n",
"from reportlab.lib.units import inch\n",
"from reportlab.platypus import Paragraph\n",
"from reportlab.lib.styles import getSampleStyleSheet\n",
"import urllib.request\n",
"import pandas as pd\n",
"import random"
]
},
{
"cell_type": "markdown",
"source": [
"# Generate Synthetic Data ๐Ÿ› ๏ธ\n",
"\n",
"Here we create synthetic data for tax forms using the `generate_form_data` function.\n",
"This function generates random details such as:\n",
"- **Name**\n",
"- **Social Security Number (SSN)**\n",
"- **Income**\n",
"- **Tax Paid**\n",
"- **Refund Amount**\n",
"\n",
"Once the data is created, we load it into a pandas DataFrame for easy preview and manipulation."
],
"metadata": {
"id": "Nsp1zKiBc9vT"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jawRur6FcNAf"
},
"outputs": [],
"source": [
"# Step 3: Generate Synthetic Data\n",
"def generate_form_data(num_rows):\n",
" data = []\n",
" for _ in range(num_rows):\n",
" row = {\n",
" \"Name\": f\"Name_{random.randint(1, 1000)}\",\n",
" \"SSN\": f\"{random.randint(100, 999)}-{random.randint(10, 99)}-{random.randint(1000, 9999)}\",\n",
" \"Income\": round(random.uniform(20000, 150000), 2),\n",
" \"Tax Paid\": round(random.uniform(1000, 50000), 2),\n",
" \"Refund\": round(random.uniform(0, 5000), 2),\n",
" }\n",
" data.append(row)\n",
" return pd.DataFrame(data)\n",
"\n",
"dataframe = generate_form_data(10)"
]
},
{
"cell_type": "markdown",
"source": [
"## Preview the Data ๐Ÿ‘€\n",
"\n",
"Take a look at the first few rows of the generated data to ensure everything looks good!\n",
"\n",
"The data is displayed as a pandas DataFrame."
],
"metadata": {
"id": "WkPgIDO2dSnb"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "l9bl0N1ecNAf"
},
"outputs": [],
"source": [
"# Step 4: Preview the DataFrame\n",
"dataframe.head(3)"
]
},
{
"cell_type": "markdown",
"source": [
"## Generate PDF Tax Forms โœ๏ธ\n",
"\n",
"Using the `create_pdf` function, we:\n",
"- Format the data for each individual.\n",
"- Add a header with a logo (downloaded dynamically).\n",
"- Generate a custom PDF for each individual.\n",
"\n",
"Each PDF includes:\n",
"- Name\n",
"- SSN\n",
"- Income\n",
"- Tax Paid\n",
"- Refund\n",
"\n",
"At the end of this step, we generate download links for each PDF file! ๐Ÿ’พ"
],
"metadata": {
"id": "4nUesrCZdfya"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fPLY_NmUcNAf"
},
"outputs": [],
"source": [
"# Step 5: Define PDF Generation Function\n",
"def create_pdf(row, output_file):\n",
" # Extract data for the current individual\n",
" name = row[\"Name\"]\n",
" ssn = row[\"SSN\"]\n",
" income = f\"${row['Income']:,}\"\n",
" tax_paid = f\"${row['Tax Paid']:,}\"\n",
" refund = f\"${row['Refund']:,}\"\n",
"\n",
" # Create a PDF document\n",
" doc = SimpleDocTemplate(output_file, pagesize=letter)\n",
" elements = []\n",
"\n",
" # optional: generate a random logo and save it locally because reportlab does not support remote images\n",
" # Add a User-Agent header to mimic a browser\n",
" url = \"https://fakeimg.pl/200x100/4d4dfa/ffffff?text=Logo&font=bebas\"\n",
" local_logo_path = \"logo.png\"\n",
"\n",
"# Build a request with headers\n",
" req = urllib.request.Request(\n",
" url,\n",
" headers={\n",
" \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\"\n",
" },\n",
" )\n",
"\n",
"# Download the image\n",
" try:\n",
" with urllib.request.urlopen(req) as response, open(local_logo_path, \"wb\") as out_file:\n",
" out_file.write(response.read())\n",
" print(f\"Logo downloaded successfully to {local_logo_path}.\")\n",
" except Exception as e:\n",
" print(f\"Failed to download logo: {e}\")\n",
"\n",
" # Add a logo\n",
" logo_path = \"logo.png\" # Use the downloaded local image\n",
" try:\n",
" logo = Image(logo_path, width=100, height=50)\n",
" elements.append(logo)\n",
" except:\n",
" print(\"Logo not found; skipping logo.\")\n",
"\n",
" # Add a title\n",
" styles = getSampleStyleSheet()\n",
" title = Paragraph(\"1040 Tax Summary\", styles[\"Title\"])\n",
" elements.append(title)\n",
"\n",
" # Formatted table for individual data\n",
" data = [\n",
" [Paragraph(\"<b>Name</b>\", styles[\"Normal\"]), name],\n",
" [Paragraph(\"<b>SSN</b>\", styles[\"Normal\"]), ssn],\n",
" [Paragraph(\"<b>Income</b>\", styles[\"Normal\"]), income],\n",
" [Paragraph(\"<b>Tax Paid</b>\", styles[\"Normal\"]), tax_paid],\n",
" [Paragraph(\"<b>Refund</b>\", styles[\"Normal\"]), refund],\n",
" ]\n",
"\n",
" table = Table(data, colWidths=[2.5 * inch, 4 * inch])\n",
" table.setStyle(TableStyle([\n",
" (\"GRID\", (0, 0), (-1, -1), 1, colors.black),\n",
" (\"BACKGROUND\", (0, 0), (-1, 0), colors.lightgrey),\n",
" (\"ALIGN\", (0, 0), (-1, -1), \"LEFT\"),\n",
" (\"VALIGN\", (0, 0), (-1, -1), \"MIDDLE\"),\n",
" (\"FONTNAME\", (0, 0), (-1, -1), \"Helvetica\"),\n",
" ]))\n",
"\n",
" # Add table to elements\n",
" elements.append(table)\n",
"\n",
" # Build the PDF\n",
" doc.build(elements)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "74hTFnSkcNAg"
},
"outputs": [],
"source": [
"from IPython.display import FileLink, display\n",
"\n",
"# Step 6: Generate PDFs for Each Row and Provide Download Links\n",
"for index, row in dataframe.iterrows():\n",
" output_filename = f\"tax_form_{row['Name']}.pdf\"\n",
" create_pdf(row, output_filename)\n",
" print(f\"Generated {output_filename}\")\n",
" # Provide a download link for each PDF\n",
" display(FileLink(output_filename))"
]
},
{
"cell_type": "markdown",
"source": [
"# ๐ŸŽ‰ You Did It!\n",
"\n",
"You've successfully:\n",
"1. Created synthetic tax data.\n",
"2. Previewed the data in a pandas DataFrame.\n",
"3. Generated custom PDF tax forms for each individual.\n",
"4. Provided download links for the PDFs.\n",
"\n",
"Feel free to modify this notebook to customize the forms or add additional features. ๐Ÿš€"
],
"metadata": {
"id": "qCqUFUnUd-Wh"
}
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
},
"colab": {
"provenance": [],
"collapsed_sections": [
"uJeDTs1juwAT"
],
"include_colab_link": true
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment