Skip to content

Instantly share code, notes, and snippets.

@sjsrey
Last active February 25, 2022 23:28
Show Gist options
  • Save sjsrey/788b937413be263d6f7ec094f7ea205a to your computer and use it in GitHub Desktop.
Save sjsrey/788b937413be263d6f7ec094f7ea205a to your computer and use it in GitHub Desktop.
Gini Optimization
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "a7df7d15-34c0-4e26-952d-3f08e4e20a13",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Author: Serge Rey\n",
"\n",
"Last updated: 2022-02-24\n",
"\n",
"Python implementation: CPython\n",
"Python version : 3.9.9\n",
"IPython version : 7.31.0\n",
"\n",
"segregation: 2.1.0\n",
"geopandas : 0.10.2\n",
"libpysal : 4.6.0\n",
"numpy : 1.22.0\n",
"pandas : 1.3.5\n",
"\n"
]
}
],
"source": [
"%load_ext watermark\n",
"%watermark -a 'Serge Rey' -v -d -u -p segregation,geopandas,libpysal,numpy,pandas"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "74f7a73b-4142-4a11-9c69-66ead9024a9a",
"metadata": {
"lines_to_next_cell": 0
},
"outputs": [],
"source": [
"import geopandas\n",
"import pandas\n",
"import libpysal\n",
"import numpy\n",
"import segregation"
]
},
{
"cell_type": "markdown",
"id": "f227a625-a955-48f4-b181-9d0e0f9c1081",
"metadata": {},
"source": [
"## Introduction"
]
},
{
"cell_type": "markdown",
"id": "bf6c6b77-6e87-4987-8c59-80e2df292b6d",
"metadata": {},
"source": [
"As a measure of segregation, the Gini index for a minority group in a given city with $n$ spatial units is given as:\n",
"$$ G = \\sum_{i=1}^n \\sum_{j=1}^n t_i t_j \\left|p_i - p_j \\right| / [2T^2 P(1-P)]$$\n",
"with $t_i$ and $p_i$ the total population and minority population of area $i$, and $T$ and $P$ the total population and total minority population for the city. It can be interpreted in a number of ways. The Gini is the mean weighted absolute difference between area pair minority proportions expressed as a proportion of he maximum weighted mean difference. The Gini also\n",
"represents the area between the Lorenze cure and the diagonal of evenness, expressed as a proportion of the total area under the diagonal. \n",
"\n",
"This notebook explores the computational issues involved in the implementation of the Gini coefficient. The speed versus memory trade-offs of different implementations are examined. Ultimately, this notebook will serve to document any changes that may be made to the pysal package.\n"
]
},
{
"cell_type": "markdown",
"id": "981e7304-437d-46d3-b9d8-fe0869267792",
"metadata": {},
"source": [
"## Example Data\n",
"\n",
"To begin, we will first create some synthetic data for a hypothetical city consisting of 10,000 spatial units. The population is composed of two groups, one which represents 20 percent of the city-wide population, and the majority group that represents 80 percent. This will be a city free from segregation in the sense that the areal units have compositions that are aligned with the city-wide composition.\n",
"\n",
"Here we use the new [best practice](https://towardsdatascience.com/stop-using-numpy-random-seed-581a9972805f) for setting the random number generator's seed."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8a812859-e64d-4a47-80b5-f530ca277c10",
"metadata": {},
"outputs": [],
"source": [
"n = 10000\n",
"rng = numpy.random.default_rng(2021)\n",
"bw = rng.multinomial(4000, [.2, .8], size=n)\n",
"T = numpy.ones((n,1)) * 4000\n",
"b = bw[:,0][:,numpy.newaxis]\n",
"P = b/ T\n",
"df = pandas.DataFrame(data = T, columns=['T'])\n",
"df['P'] = P\n",
"df['B'] = b"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d22b0ea9-f628-4553-9b18-1a7b3b2f39fb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>T</th>\n",
" <th>P</th>\n",
" <th>B</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>4000.0</td>\n",
" <td>0.20650</td>\n",
" <td>826</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4000.0</td>\n",
" <td>0.21075</td>\n",
" <td>843</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4000.0</td>\n",
" <td>0.19825</td>\n",
" <td>793</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4000.0</td>\n",
" <td>0.19600</td>\n",
" <td>784</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4000.0</td>\n",
" <td>0.19800</td>\n",
" <td>792</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" T P B\n",
"0 4000.0 0.20650 826\n",
"1 4000.0 0.21075 843\n",
"2 4000.0 0.19825 793\n",
"3 4000.0 0.19600 784\n",
"4 4000.0 0.19800 792"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "c62b33cb-17b8-4e3b-8e11-a11e7c6578a2",
"metadata": {},
"source": [
"## Current Implementation\n",
"\n",
"This notebook uses segregation version 2.1.0. For the Gini class, three arguments are required to calculate the Gini coefficient:\n",
"\n",
"- the dataframe\n",
"- the name of the column holding the minority population by area\n",
"- the name of the column holding the total population by area"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1f6f8c70-6eca-4e09-a5ef-68cbf08196a8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.022082261551782664"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res = segregation.singlegroup.Gini(df, 'B', 'T')\n",
"res.statistic"
]
},
{
"cell_type": "markdown",
"id": "6ad8f2e0-70c8-455e-be47-7bcee9128db5",
"metadata": {},
"source": [
"In order to improve anything, you have to first measure it. So we are going to see how the current implementation performs with regards to its speed and memory requirements.\n",
"\n",
"### Profiling: time\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "1f03b586-5b6e-4e9c-820f-4bf08f5f959d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"427 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"res = segregation.singlegroup.Gini(df, 'B', 'T')\n",
"res.statistic"
]
},
{
"cell_type": "markdown",
"id": "253bfd83-a17e-40c3-9608-1f6323bb3db2",
"metadata": {},
"source": [
"### Profiling: memory"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "1cdb0c23-7c2f-44f5-ad7d-9c0087b46a91",
"metadata": {},
"outputs": [],
"source": [
"%load_ext memory_profiler"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c23db2a7-6b7f-4717-95ac-767901fc2582",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"peak memory: 1815.99 MiB, increment: 1525.79 MiB\n"
]
}
],
"source": [
"%memit segregation.singlegroup.Gini(df, 'B', 'T')"
]
},
{
"cell_type": "markdown",
"id": "18d36e78-e3c8-4df7-8b27-737896fb84d0",
"metadata": {},
"source": [
"Wow, that seems like a good chunk of memory to be using. Why is this?\n",
"\n",
"The key computational bottleneck is the numerator from the equation above:\n",
"\n",
"$$ \\sum_{i=1}^n \\sum_{j=1}^n t_i t_j \\left|p_i - p_j \\right|$$\n",
"\n",
"which involves a double summation. Rather than implement this in\n",
"nested for loops that work over all columns and rows of the pairwise\n",
"difference matrix, the [current implementation][ci]\n",
"is leaning on the\n",
"vectorization, and other, capabilities of `numpy` to get an efficient\n",
"calculation of the index. More specifically, this numerator is implemented as:\n",
"\n",
"[ci]: https://github.com/pysal/segregation/blob/master/segregation/singlegroup/gini.py#L47"
]
},
{
"cell_type": "markdown",
"id": "30a77e83",
"metadata": {},
"source": [
"```\n",
" T = data[total_pop_var].sum()\n",
" P = data[group_pop_var].sum() / T\n",
"\n",
" # If a unit has zero population, the group of interest frequency is zero\n",
" data = data.assign(\n",
" ti=data[total_pop_var],\n",
" pi=np.where(\n",
" data[total_pop_var] == 0, 0, data[group_pop_var] / data[total_pop_var]\n",
" ),\n",
" )\n",
"\n",
" num = (\n",
" np.matmul(np.array(data.ti)[np.newaxis].T, np.array(data.ti)[np.newaxis])\n",
" * abs(np.array(data.pi)[np.newaxis].T - np.array(data.pi)[np.newaxis])\n",
" ).sum()\n",
" den = 2 * T ** 2 * P * (1 - P)\n",
" G = num / den\n",
"\n",
"```\n",
"where `data` is the dataframe, `ti` is the column with total area population and `pi` is the share of the minority population in each area.\n",
"This is creating two $n \\times n$ matrices, taking their element-wise product, and then summing to get the numerator of the Gini.\n",
"\n",
"So this implementation is paying for the speed via a large memory bill. Let's see if we can improve on this."
]
},
{
"cell_type": "markdown",
"id": "75aad0f8",
"metadata": {},
"source": [
"## New Implementations\n",
"\n",
"A first idea is to eschew the construction of the large matrices, but still exploit vectorization. This can be done through a single for loop that creates an individual row of the relevant full matrix in each pass, and uses a sum as a reduction:"
]
},
{
"cell_type": "markdown",
"id": "bb1352fe-3f6b-4d08-ac29-81532c67d150",
"metadata": {},
"source": [
"### Loop"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "beb843e5-95cd-4c76-8bf7-fb9b56e0459f",
"metadata": {},
"outputs": [],
"source": [
"def gini(df, groupvar, totalvar):\n",
" n = df.shape[0]\n",
" num = numpy.zeros(1)\n",
" ti = df[totalvar]\n",
" pi = numpy.where(ti == 0, 0, df[groupvar] / ti)\n",
" T = ti.sum()\n",
" P = df[groupvar].sum() / T\n",
" for i in range(n-1):\n",
" num += (ti[i] * ti[i+1:] * numpy.abs(pi[i] - pi[i+1:])).sum()\n",
" num *= 2\n",
" den = (2 * T * T * P * (1-P))\n",
" return (num / den)[0]\n",
" "
]
},
{
"cell_type": "markdown",
"id": "175f828e",
"metadata": {},
"source": [
"This has a similar signature to the current implementation:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4c4c6b24-1cb8-4d96-b35e-d4a3282b18b8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.022082261551782667"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res = gini(df, 'B', 'T')\n",
"res"
]
},
{
"cell_type": "markdown",
"id": "6c412b9d-41cd-43f9-953d-03ae04f7d96f",
"metadata": {},
"source": [
"### Profiling: time\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "3954d072-4f3b-49fd-86cf-757ababa3c36",
"metadata": {
"lines_to_next_cell": 0
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.18 s ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"res = gini(df, 'B', 'T')\n",
"res"
]
},
{
"cell_type": "markdown",
"id": "a3e96e5b-262d-49d9-9e17-6e1dd11b31c9",
"metadata": {},
"source": [
"Its slower - no surprise as we unrolled one of the dimensions that was previously vectorized into a for loop.\n",
"\n",
"### Profiling: memory\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "1643fc8b-a0c2-49f9-9353-b6a7e42d37e8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"peak memory: 290.50 MiB, increment: 0.03 MiB\n"
]
}
],
"source": [
"%memit gini(df, 'B', 'T')"
]
},
{
"cell_type": "markdown",
"id": "b81c8294-115a-40fa-8ca8-db5fe48dbec3",
"metadata": {},
"source": [
"But, it has a smaller memory footprint than the current implementation since this avoids the need for the full matrices.\n",
"\n",
"### Parallel\n",
"\n",
"Given that there are $n$ rows to calculate, we might be able to do these in parallel. To do so we can use [joblib][jl]\n",
"\n",
"[jl]: https://joblib.readthedocs.io/en/latest/"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "9584e15a-6336-404c-9a03-e0955a5622ec",
"metadata": {
"lines_to_next_cell": 0
},
"outputs": [],
"source": [
"def gini_par(df, groupvar, totalvar, n_jobs=1):\n",
" \n",
" n = df.shape[0]\n",
" num = numpy.zeros(1)\n",
" ti = df[totalvar]\n",
" pi = numpy.where(ti == 0, 0, df[groupvar] / ti)\n",
" T = ti.sum()\n",
" P = df[groupvar].sum() / T\n",
" def _g(i, ti, pi):\n",
" return (ti[i] * ti[i+1:] * numpy.abs(pi[i] - pi[i+1:])).sum()\n",
" \n",
" from joblib import Parallel, delayed, parallel_backend\n",
" import os\n",
" \n",
" \n",
" if n_jobs == -1:\n",
" n_jobs = os.cpu_count()\n",
" \n",
" \n",
" \n",
" with parallel_backend(\"loky\"):\n",
" nums = Parallel(n_jobs=n_jobs)(\n",
" delayed(_g)(i, ti, pi) for i in range(n-1))\n",
" num = numpy.array(nums).sum()\n",
" \n",
" num *= 2\n",
" den = (2 * T * T * P * (1-P))\n",
" return num / den\n",
" "
]
},
{
"cell_type": "markdown",
"id": "559fc059",
"metadata": {},
"source": [
"Here we have moved the calculation of each row into a function (`_g`) that will be used on chunks of rows each being sent to different jobs. The number of jobs is an argument to our `gini_par` function which has a similar signature the our previous implementations with the exception of this one new optional argument. The default is to use 1 cpu, which means we have a sequential implementation. Other possibilities are to pass in the specific number of cores to use, or set `n_jobs=-1` to have all cores used."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "a3ab3c8d-a547-4e4b-8870-7a50b6b3b4f3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.02208226155178265"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gini_par(df, 'B', 'T', n_jobs=4)"
]
},
{
"cell_type": "markdown",
"id": "c910dbb4",
"metadata": {
"lines_to_next_cell": 0
},
"source": [
"Let's see how this does with regard to speed:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "9258d717-d9d8-4acb-a711-bee5d8aea70e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"759 ms ± 9.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"gini_par(df, 'B', 'T', n_jobs=4)"
]
},
{
"cell_type": "markdown",
"id": "ecbf0eaf",
"metadata": {},
"source": [
"We get an improvement over the slow loop that has the lower memory requirments over the current implementation. This is good, as we will see this parallel implementation shares the improved memory requirements of the loop approach. However, the parallel implementation does not beat the existing approach in speed. \n",
"\n",
"We can try different settings for the number of cores to use:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "e3fd3ddb-94d7-4a06-a0cd-ca43084f7807",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"585 ms ± 13.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"gini_par(df, 'B', 'T', n_jobs=8)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "749ab99f-c493-411f-845e-224b7484a84f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"704 ms ± 14.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"gini_par(df, 'B', 'T', n_jobs=-1)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "6f4bedd6-7cc2-4976-8179-605141d4bb71",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"588 ms ± 9.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"\n",
"gini_par(df, 'B', 'T', n_jobs=10)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "0121522d-761a-4d5c-94d0-e1f3f4ba8a36",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"660 ms ± 32.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"\n",
"gini_par(df, 'B', 'T', n_jobs=14)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "786d278a-6084-4bb1-a499-e49510e64356",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"16"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import os\n",
"os.cpu_count()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "7800c78c-1d63-4cfa-b1dc-2e5c4821ebdf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"peak memory: 294.85 MiB, increment: 0.02 MiB\n"
]
}
],
"source": [
"%memit gini_par(df, 'B', 'T', n_jobs=4)"
]
},
{
"cell_type": "markdown",
"id": "e29610e4-3ebc-4987-a7e3-1dbd4c1893a5",
"metadata": {},
"source": [
"There is not necessarily a direct relationship between the number of cores and the speed improvements, and this can be for a variety of reasons that the interested reader can pursue [elsewhere][scale]. Here we are doing an initial investigation of simple parallelization as an optimization over the memory bound approach in pysal.\n",
"\n",
"[scale]: https://joblib.readthedocs.io/en/latest/parallel.html#avoiding-over-subscription-of-cpu-resources\n",
"\n",
"## numba\n",
"The final approach we are going to examine relies on [numba][numba] which translates Python functions to optimized machine code.\n",
"\n",
"[numba]: https://numba.pydata.org/\n",
"\n",
"We first are going to allow for a soft-dependency for numba, meaning if the user has numba installed, then we will use it. Otherwise, we will mimick the numba decorator with our loop-based function so there will be no speedup, but the user can at least move forward."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "0fcee73c-c8d9-4fc7-85dc-5bc33c0b0f9f",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import warnings\n",
"\n",
"try:\n",
" from numba import njit, jit, prange, boolean\n",
"except (ImportError, ModuleNotFoundError):\n",
"\n",
" def jit(*dec_args, **dec_kwargs):\n",
" \"\"\"\n",
" decorator mimicking numba.jit\n",
" \"\"\"\n",
"\n",
" def intercepted_function(f, *f_args, **f_kwargs):\n",
" return f\n",
"\n",
" return intercepted_function\n",
"\n",
" njit = jit\n",
"\n",
" prange = range\n",
" boolean = bool\n"
]
},
{
"cell_type": "markdown",
"id": "051bdbc6",
"metadata": {},
"source": [
"With this in place, we then decorate our `gini_vec` function:\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "7e486ff3-e7aa-4ea5-a8d1-e2398acc8a4d",
"metadata": {},
"outputs": [],
"source": [
"@njit(fastmath=True)\n",
"def gini_vec(ti: np.ndarray, pi: np.ndarray):\n",
" n = ti.shape[0]\n",
" num = np.zeros(1)\n",
" T = ti.sum()\n",
" P = pi.sum() / T\n",
" pi = numpy.where(ti == 0, 0, pi / ti)\n",
" T = ti.sum()\n",
" for i in range(n-1):\n",
" num += (ti[i] * ti[i+1:] * numpy.abs(pi[i] - pi[i+1:])).sum()\n",
" num *= 2\n",
" den = (2 * T * T * P * (1-P))\n",
" return (num / den)[0]\n",
" "
]
},
{
"cell_type": "markdown",
"id": "bb3023eb",
"metadata": {},
"source": [
"We are starting with the [fastmath][fastmath] option for the decorator, which allows for a tradeoff that relaxes some mathematical rigour in the translation in order to gain speedups.\n",
"\n",
"[fastmath]: https://numba.pydata.org/numba-doc/latest/user/performance-tips.html?highlight=fastmath#fastmath\n",
"\n",
"We also change the signature of our `gini_vec` function in two ways. First, we no longer are passing in the dataframe, and the two column name strings, but rather we are passing in two ndarrays that hold the total population and minority population values for the spatial units. This is because numba doesn't (yet) work directly with dataframes.\n",
"\n",
"The second change is we specify the types of these two arguments. The typing helps numba to optimize the translation to machine code.\n",
"\n",
"As a result of the new signature, we need to do a little work first by pulling out the arrays from the dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "a36a9d79-0b3d-42c1-9bcd-74ea55282241",
"metadata": {},
"outputs": [],
"source": [
"ti = df['T'].values\n",
"pi = df['B'].values"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "03bb67fd-1ea0-49cc-aaca-1e2e720cb937",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(10000,)"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ti.shape"
]
},
{
"cell_type": "markdown",
"id": "52c2c809",
"metadata": {},
"source": [
"Let's call our newly decorated function with the arrays:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "552dde7c-452d-4155-90bf-6fd0237a609f",
"metadata": {
"lines_to_next_cell": 0
},
"outputs": [
{
"data": {
"text/plain": [
"0.022082261551782667"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gini_vec(ti, pi)"
]
},
{
"cell_type": "markdown",
"id": "8ae836b3",
"metadata": {},
"source": [
"so the return is the correct value.\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "f13d5c3a-b934-4dc2-82e8-0d6cf17cc8aa",
"metadata": {
"lines_to_next_cell": 0
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"25 ms ± 318 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
"source": [
"%%timeit \n",
"gini_vec(ti, pi)"
]
},
{
"cell_type": "markdown",
"id": "4e6750e4",
"metadata": {},
"source": [
"Now, we are beating the original implementation. On this machine the speedup is ~18x. \n",
"Moreover, we also keep a lower memory footprint over that implementation."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "bc42e6a7-7411-4593-8ce8-b2bb81e36e1b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"peak memory: 308.54 MiB, increment: 0.02 MiB\n"
]
}
],
"source": [
"%memit gini_vec(ti, pi)"
]
},
{
"cell_type": "markdown",
"id": "3e2d88dd",
"metadata": {},
"source": [
"There are also some other options we can use with numba to try to earn additional gains. One is to set `parallel=True`\n",
"which tells numba to try to [exploit any parallelization][numbap] it can in the translation:\n",
"\n",
"[numbap]: https://numba.pydata.org/numba-doc/latest/user/parallel.html\n"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "8128659f-297f-4bb2-b0ca-3821ab61bc96",
"metadata": {},
"outputs": [],
"source": [
"@njit(parallel=True)\n",
"def gini_vecp(ti: np.ndarray, pi: np.ndarray):\n",
" n = ti.shape[0]\n",
" num = np.zeros(1)\n",
" T = ti.sum()\n",
" P = pi.sum() / T\n",
" pi = numpy.where(ti == 0, 0, pi / ti)\n",
" T = ti.sum()\n",
" for i in prange(n-1):\n",
" num += (ti[i] * ti[i+1:] * numpy.abs(pi[i] - pi[i+1:])).sum()\n",
" num *= 2\n",
" den = (2 * T * T * P * (1-P))\n",
" return (num / den)[0]\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "a44ae97b-7d82-4b36-8f1d-93d3778cdbfd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.022082261551782667"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gini_vecp(ti, pi)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "0e106e32-3498-4424-8435-3caee9d887cc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"8.26 ms ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"gini_vecp(ti, pi)"
]
},
{
"cell_type": "markdown",
"id": "1b9f52ba",
"metadata": {},
"source": [
"Nothing major changes here when we have used `parallel` in place of `fastmath`. But what about if we use them together:\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "f428e06f-5970-4e67-92bd-bef37fd13ee2",
"metadata": {},
"outputs": [],
"source": [
"@njit(parallel=True, fastmath=True)\n",
"def gini_vecp(ti: np.ndarray, pi: np.ndarray):\n",
" n = ti.shape[0]\n",
" num = np.zeros(1)\n",
" T = ti.sum()\n",
" P = pi.sum() / T\n",
" pi = numpy.where(ti == 0, 0, pi / ti)\n",
" T = ti.sum()\n",
" for i in prange(n-1):\n",
" num += (ti[i] * ti[i+1:] * numpy.abs(pi[i] - pi[i+1:])).sum()\n",
" num *= 2\n",
" den = (2 * T * T * P * (1-P))\n",
" return (num / den)[0]\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "b31f040d-7891-4fb1-ae82-28d2426daef2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.022082261551782667"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gini_vecp(ti, pi)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "3ec54de6-4fb2-44a2-8409-b5ea2bb4cab6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.4 ms ± 33.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"gini_vecp(ti, pi)"
]
},
{
"cell_type": "markdown",
"id": "993b94fd",
"metadata": {},
"source": [
"Bingo! Now we have a speed-up of 177x,"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "a665090e-04a4-4158-92e6-55e628d20c52",
"metadata": {
"lines_to_next_cell": 0
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"peak memory: 352.76 MiB, increment: -1.25 MiB\n"
]
}
],
"source": [
"%memit gini_vecp(ti, pi)"
]
},
{
"cell_type": "markdown",
"id": "b1bfe7f7",
"metadata": {},
"source": [
"and the memory use has dropped from 1.8GB in the original implementation to 353MB here."
]
}
],
"metadata": {
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
@knaaptime
Copy link

sweet. I'm sure numba beats anything here, but one other param that tends to give me more variance (vs n_cpus) in joblib is backend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment