Skip to content

Instantly share code, notes, and snippets.

@MaxGhenis
Last active August 3, 2025 10:37
Show Gist options
  • Save MaxGhenis/74678e11c0fa830fa9c18924e362f24f to your computer and use it in GitHub Desktop.
Save MaxGhenis/74678e11c0fa830fa9c18924e362f24f to your computer and use it in GitHub Desktop.
Comparison of numpy division by zero handling approaches
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Comparing Division by Zero Handling Approaches\n",
"\n",
"This notebook demonstrates three approaches to handling division by zero in numpy arrays:\n",
"1. The `where` approach (causes warnings)\n",
"2. The mask approach (current PolicyEngine standard)\n",
"3. The `np.divide` approach (potential alternative)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2025-08-03T10:37:29.881302Z",
"iopub.status.busy": "2025-08-03T10:37:29.881235Z",
"iopub.status.idle": "2025-08-03T10:37:29.962623Z",
"shell.execute_reply": "2025-08-03T10:37:29.962214Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Income: [ 50000 75000 0 100000 0 25000]\n",
"Tax: [ 5000 10000 0 15000 100 2000]\n",
"\n",
"Expected effective tax rates:\n",
"[0.1, 0.133, 0.0, 0.15, 0.0, 0.08]\n"
]
}
],
"source": [
"import numpy as np\n",
"import warnings\n",
"\n",
"# Create test data with zeros that would cause division by zero\n",
"income = np.array([50000, 75000, 0, 100000, 0, 25000])\n",
"tax = np.array([5000, 10000, 0, 15000, 100, 2000])\n",
"\n",
"print(\"Income:\", income)\n",
"print(\"Tax:\", tax)\n",
"print(\"\\nExpected effective tax rates:\")\n",
"print(\"[0.1, 0.133, 0.0, 0.15, 0.0, 0.08]\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. The `where` Approach (Causes Warnings)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2025-08-03T10:37:29.979738Z",
"iopub.status.busy": "2025-08-03T10:37:29.979558Z",
"iopub.status.idle": "2025-08-03T10:37:29.982598Z",
"shell.execute_reply": "2025-08-03T10:37:29.982206Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"⚠️ Warning raised: divide by zero encountered in divide\n",
"\n",
"Result: [0.1 0.13333333 0. 0.15 0. 0.08 ]\n",
"Correct values: True\n"
]
}
],
"source": [
"# This approach evaluates the division for ALL values before selecting\n",
"with warnings.catch_warnings(record=True) as w:\n",
" warnings.simplefilter(\"always\")\n",
" \n",
" effective_rate_where = np.where(income > 0, tax / income, 0)\n",
" \n",
" if w:\n",
" print(f\"⚠️ Warning raised: {w[0].message}\")\n",
" else:\n",
" print(\"✓ No warnings\")\n",
" \n",
"print(\"\\nResult:\", effective_rate_where)\n",
"print(\"Correct values:\", np.allclose(effective_rate_where, [0.1, 0.133333, 0, 0.15, 0, 0.08], atol=1e-6))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. The Mask Approach (Current PolicyEngine Standard)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2025-08-03T10:37:29.986306Z",
"iopub.status.busy": "2025-08-03T10:37:29.986169Z",
"iopub.status.idle": "2025-08-03T10:37:29.989338Z",
"shell.execute_reply": "2025-08-03T10:37:29.988952Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"✓ No warnings\n",
"\n",
"Result: [0.1 0.13333333 0. 0.15 0. 0.08 ]\n",
"Correct values: True\n"
]
}
],
"source": [
"# This approach only divides where the mask is True\n",
"with warnings.catch_warnings(record=True) as w:\n",
" warnings.simplefilter(\"always\")\n",
" \n",
" effective_rate_mask = np.zeros_like(income, dtype=float)\n",
" mask = income != 0\n",
" effective_rate_mask[mask] = tax[mask] / income[mask]\n",
" \n",
" if w:\n",
" print(f\"⚠️ Warning raised: {w[0].message}\")\n",
" else:\n",
" print(\"✓ No warnings\")\n",
" \n",
"print(\"\\nResult:\", effective_rate_mask)\n",
"print(\"Correct values:\", np.allclose(effective_rate_mask, [0.1, 0.133333, 0, 0.15, 0, 0.08], atol=1e-6))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. The `np.divide` Approach (Potential Alternative)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2025-08-03T10:37:29.994964Z",
"iopub.status.busy": "2025-08-03T10:37:29.994006Z",
"iopub.status.idle": "2025-08-03T10:37:30.006619Z",
"shell.execute_reply": "2025-08-03T10:37:30.005964Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"✓ No warnings\n",
"\n",
"Result: [0.1 0.13333333 0. 0.15 0. 0.08 ]\n",
"Correct values: True\n"
]
}
],
"source": [
"# This approach uses np.divide with out and where parameters\n",
"with warnings.catch_warnings(record=True) as w:\n",
" warnings.simplefilter(\"always\")\n",
" \n",
" effective_rate_divide = np.divide(\n",
" tax, \n",
" income,\n",
" out=np.zeros_like(income, dtype=float),\n",
" where=income != 0\n",
" )\n",
" \n",
" if w:\n",
" print(f\"⚠️ Warning raised: {w[0].message}\")\n",
" else:\n",
" print(\"✓ No warnings\")\n",
" \n",
"print(\"\\nResult:\", effective_rate_divide)\n",
"print(\"Correct values:\", np.allclose(effective_rate_divide, [0.1, 0.133333, 0, 0.15, 0, 0.08], atol=1e-6))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Performance Comparison"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2025-08-03T10:37:30.015169Z",
"iopub.status.busy": "2025-08-03T10:37:30.014097Z",
"iopub.status.idle": "2025-08-03T10:37:30.221086Z",
"shell.execute_reply": "2025-08-03T10:37:30.193616Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Testing with 1,000,000 elements..."
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of zeros in income: 4\n"
]
}
],
"source": [
"# Create larger arrays for performance testing\n",
"np.random.seed(42)\n",
"large_income = np.random.randint(0, 200000, size=1_000_000)\n",
"large_tax = np.random.randint(0, 50000, size=1_000_000)\n",
"\n",
"print(f\"Testing with {len(large_income):,} elements...\")\n",
"print(f\"Number of zeros in income: {(large_income == 0).sum():,}\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2025-08-03T10:37:30.297363Z",
"iopub.status.busy": "2025-08-03T10:37:30.294440Z",
"iopub.status.idle": "2025-08-03T10:37:31.594329Z",
"shell.execute_reply": "2025-08-03T10:37:31.593967Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mask approach: 10.50 ms\n",
"np.divide approach: 1.90 ms\n",
"\n",
"np.divide reduces runtime by 82%\n"
]
}
],
"source": [
"import timeit\n",
"\n",
"# Time the mask approach\n",
"def mask_approach():\n",
" result = np.zeros_like(large_income, dtype=float)\n",
" mask = large_income != 0\n",
" result[mask] = large_tax[mask] / large_income[mask]\n",
" return result\n",
"\n",
"# Time the np.divide approach\n",
"def divide_approach():\n",
" return np.divide(\n",
" large_tax, \n",
" large_income,\n",
" out=np.zeros_like(large_income, dtype=float),\n",
" where=large_income != 0\n",
" )\n",
"\n",
"# Run timing tests\n",
"mask_time = timeit.timeit(mask_approach, number=100) / 100\n",
"divide_time = timeit.timeit(divide_approach, number=100) / 100\n",
"\n",
"print(f\"Mask approach: {mask_time*1000:.2f} ms\")\n",
"print(f\"np.divide approach: {divide_time*1000:.2f} ms\")\n",
"\n",
"runtime_reduction = (mask_time - divide_time) / mask_time * 100\n",
"print(f\"\\nnp.divide reduces runtime by {runtime_reduction:.0f}%\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"Both the mask approach and `np.divide` approach:\n",
"- ✅ Avoid division by zero warnings\n",
"- ✅ Produce correct results\n",
"- ✅ Have similar performance\n",
"\n",
"The `np.divide` approach offers:\n",
"- More concise, single-expression syntax\n",
"- Built-in numpy function designed for this use case\n",
"- Already used in some parts of PolicyEngine codebase\n",
"\n",
"The mask approach:\n",
"- Is the current PolicyEngine standard\n",
"- Is more explicit about what's happening\n",
"- Has been thoroughly tested in the codebase"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.17"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment