Last active
August 3, 2025 10:37
-
-
Save MaxGhenis/74678e11c0fa830fa9c18924e362f24f to your computer and use it in GitHub Desktop.
Comparison of numpy division by zero handling approaches
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Comparing Division by Zero Handling Approaches\n", | |
| "\n", | |
| "This notebook demonstrates three approaches to handling division by zero in numpy arrays:\n", | |
| "1. The `where` approach (causes warnings)\n", | |
| "2. The mask approach (current PolicyEngine standard)\n", | |
| "3. The `np.divide` approach (potential alternative)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2025-08-03T10:37:29.881302Z", | |
| "iopub.status.busy": "2025-08-03T10:37:29.881235Z", | |
| "iopub.status.idle": "2025-08-03T10:37:29.962623Z", | |
| "shell.execute_reply": "2025-08-03T10:37:29.962214Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Income: [ 50000 75000 0 100000 0 25000]\n", | |
| "Tax: [ 5000 10000 0 15000 100 2000]\n", | |
| "\n", | |
| "Expected effective tax rates:\n", | |
| "[0.1, 0.133, 0.0, 0.15, 0.0, 0.08]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "import numpy as np\n", | |
| "import warnings\n", | |
| "\n", | |
| "# Create test data with zeros that would cause division by zero\n", | |
| "income = np.array([50000, 75000, 0, 100000, 0, 25000])\n", | |
| "tax = np.array([5000, 10000, 0, 15000, 100, 2000])\n", | |
| "\n", | |
| "print(\"Income:\", income)\n", | |
| "print(\"Tax:\", tax)\n", | |
| "print(\"\\nExpected effective tax rates:\")\n", | |
| "print(\"[0.1, 0.133, 0.0, 0.15, 0.0, 0.08]\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 1. The `where` Approach (Causes Warnings)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2025-08-03T10:37:29.979738Z", | |
| "iopub.status.busy": "2025-08-03T10:37:29.979558Z", | |
| "iopub.status.idle": "2025-08-03T10:37:29.982598Z", | |
| "shell.execute_reply": "2025-08-03T10:37:29.982206Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "⚠️ Warning raised: divide by zero encountered in divide\n", | |
| "\n", | |
| "Result: [0.1 0.13333333 0. 0.15 0. 0.08 ]\n", | |
| "Correct values: True\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# This approach evaluates the division for ALL values before selecting\n", | |
| "with warnings.catch_warnings(record=True) as w:\n", | |
| " warnings.simplefilter(\"always\")\n", | |
| " \n", | |
| " effective_rate_where = np.where(income > 0, tax / income, 0)\n", | |
| " \n", | |
| " if w:\n", | |
| " print(f\"⚠️ Warning raised: {w[0].message}\")\n", | |
| " else:\n", | |
| " print(\"✓ No warnings\")\n", | |
| " \n", | |
| "print(\"\\nResult:\", effective_rate_where)\n", | |
| "print(\"Correct values:\", np.allclose(effective_rate_where, [0.1, 0.133333, 0, 0.15, 0, 0.08], atol=1e-6))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 2. The Mask Approach (Current PolicyEngine Standard)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2025-08-03T10:37:29.986306Z", | |
| "iopub.status.busy": "2025-08-03T10:37:29.986169Z", | |
| "iopub.status.idle": "2025-08-03T10:37:29.989338Z", | |
| "shell.execute_reply": "2025-08-03T10:37:29.988952Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "✓ No warnings\n", | |
| "\n", | |
| "Result: [0.1 0.13333333 0. 0.15 0. 0.08 ]\n", | |
| "Correct values: True\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# This approach only divides where the mask is True\n", | |
| "with warnings.catch_warnings(record=True) as w:\n", | |
| " warnings.simplefilter(\"always\")\n", | |
| " \n", | |
| " effective_rate_mask = np.zeros_like(income, dtype=float)\n", | |
| " mask = income != 0\n", | |
| " effective_rate_mask[mask] = tax[mask] / income[mask]\n", | |
| " \n", | |
| " if w:\n", | |
| " print(f\"⚠️ Warning raised: {w[0].message}\")\n", | |
| " else:\n", | |
| " print(\"✓ No warnings\")\n", | |
| " \n", | |
| "print(\"\\nResult:\", effective_rate_mask)\n", | |
| "print(\"Correct values:\", np.allclose(effective_rate_mask, [0.1, 0.133333, 0, 0.15, 0, 0.08], atol=1e-6))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 3. The `np.divide` Approach (Potential Alternative)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2025-08-03T10:37:29.994964Z", | |
| "iopub.status.busy": "2025-08-03T10:37:29.994006Z", | |
| "iopub.status.idle": "2025-08-03T10:37:30.006619Z", | |
| "shell.execute_reply": "2025-08-03T10:37:30.005964Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "✓ No warnings\n", | |
| "\n", | |
| "Result: [0.1 0.13333333 0. 0.15 0. 0.08 ]\n", | |
| "Correct values: True\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# This approach uses np.divide with out and where parameters\n", | |
| "with warnings.catch_warnings(record=True) as w:\n", | |
| " warnings.simplefilter(\"always\")\n", | |
| " \n", | |
| " effective_rate_divide = np.divide(\n", | |
| " tax, \n", | |
| " income,\n", | |
| " out=np.zeros_like(income, dtype=float),\n", | |
| " where=income != 0\n", | |
| " )\n", | |
| " \n", | |
| " if w:\n", | |
| " print(f\"⚠️ Warning raised: {w[0].message}\")\n", | |
| " else:\n", | |
| " print(\"✓ No warnings\")\n", | |
| " \n", | |
| "print(\"\\nResult:\", effective_rate_divide)\n", | |
| "print(\"Correct values:\", np.allclose(effective_rate_divide, [0.1, 0.133333, 0, 0.15, 0, 0.08], atol=1e-6))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Performance Comparison" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 5, | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2025-08-03T10:37:30.015169Z", | |
| "iopub.status.busy": "2025-08-03T10:37:30.014097Z", | |
| "iopub.status.idle": "2025-08-03T10:37:30.221086Z", | |
| "shell.execute_reply": "2025-08-03T10:37:30.193616Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Testing with 1,000,000 elements..." | |
| ] | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "\n" | |
| ] | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Number of zeros in income: 4\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Create larger arrays for performance testing\n", | |
| "np.random.seed(42)\n", | |
| "large_income = np.random.randint(0, 200000, size=1_000_000)\n", | |
| "large_tax = np.random.randint(0, 50000, size=1_000_000)\n", | |
| "\n", | |
| "print(f\"Testing with {len(large_income):,} elements...\")\n", | |
| "print(f\"Number of zeros in income: {(large_income == 0).sum():,}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 6, | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2025-08-03T10:37:30.297363Z", | |
| "iopub.status.busy": "2025-08-03T10:37:30.294440Z", | |
| "iopub.status.idle": "2025-08-03T10:37:31.594329Z", | |
| "shell.execute_reply": "2025-08-03T10:37:31.593967Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Mask approach: 10.50 ms\n", | |
| "np.divide approach: 1.90 ms\n", | |
| "\n", | |
| "np.divide reduces runtime by 82%\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "import timeit\n", | |
| "\n", | |
| "# Time the mask approach\n", | |
| "def mask_approach():\n", | |
| " result = np.zeros_like(large_income, dtype=float)\n", | |
| " mask = large_income != 0\n", | |
| " result[mask] = large_tax[mask] / large_income[mask]\n", | |
| " return result\n", | |
| "\n", | |
| "# Time the np.divide approach\n", | |
| "def divide_approach():\n", | |
| " return np.divide(\n", | |
| " large_tax, \n", | |
| " large_income,\n", | |
| " out=np.zeros_like(large_income, dtype=float),\n", | |
| " where=large_income != 0\n", | |
| " )\n", | |
| "\n", | |
| "# Run timing tests\n", | |
| "mask_time = timeit.timeit(mask_approach, number=100) / 100\n", | |
| "divide_time = timeit.timeit(divide_approach, number=100) / 100\n", | |
| "\n", | |
| "print(f\"Mask approach: {mask_time*1000:.2f} ms\")\n", | |
| "print(f\"np.divide approach: {divide_time*1000:.2f} ms\")\n", | |
| "\n", | |
| "runtime_reduction = (mask_time - divide_time) / mask_time * 100\n", | |
| "print(f\"\\nnp.divide reduces runtime by {runtime_reduction:.0f}%\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Summary\n", | |
| "\n", | |
| "Both the mask approach and `np.divide` approach:\n", | |
| "- ✅ Avoid division by zero warnings\n", | |
| "- ✅ Produce correct results\n", | |
| "- ✅ Have similar performance\n", | |
| "\n", | |
| "The `np.divide` approach offers:\n", | |
| "- More concise, single-expression syntax\n", | |
| "- Built-in numpy function designed for this use case\n", | |
| "- Already used in some parts of PolicyEngine codebase\n", | |
| "\n", | |
| "The mask approach:\n", | |
| "- Is the current PolicyEngine standard\n", | |
| "- Is more explicit about what's happening\n", | |
| "- Has been thoroughly tested in the codebase" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.10.17" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 4 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment