Skip to content

Instantly share code, notes, and snippets.

@Ogaday
Created February 10, 2023 17:37
Show Gist options
  • Save Ogaday/8d96bb8ab54f596c15463eb07e3a1558 to your computer and use it in GitHub Desktop.
Save Ogaday/8d96bb8ab54f596c15463eb07e3a1558 to your computer and use it in GitHub Desktop.
Weighted mean for Pandas dataframes
"""
Utilities for weighted means on DataFrames.
"""
from functools import partial
from typing import Callable
import pandas as pd
def weighted_mean(frame: pd.DataFrame, value_col: str, weight_col: str) -> float:
"""Calculate the weighted mean of two columns in a dataframe.
Parameters
----------
frame
Tabular data on which to calculate the weighted mean.
value_col
The name of the column of which to take the mean.
weight_col
The name of the column with which to condition the mean.
Returns
-------
float
The mean of the value column, weighted by the weight column.
Notes
-----
The mean of ``A`` weighted by ``B`` is as follows::
sum(A * B) / sum(B)
Where ``*`` is the elementwise multiplication operator.
Examples
--------
>>> df = pd.DataFrame(
... [[23, 32], [15, 29], [24, 30], [1, 1000]],
... columns=["items_sold", "price"]
... )
>>> df
items_sold price
0 23 32
1 15 29
2 24 30
3 1 1000
If you want to find the average price of sold items, you can't simply take
the mean of the ``price`` column as the one high price item skews the mean:
>>> df["price"].mean()
272.75
Instead, you need to weight the price by number of items sold ie. the
``items_sold`` column:
>>> round(
... df.pipe(weighted_mean, value_col="price", weight_col="items_sold"),
... 2
... )
45.89
"""
return (frame[value_col] * frame[weight_col]).sum() / frame[weight_col].sum()
def weighted_mean_factory(
value_col, weight_col
) -> Callable[[pd.DataFrame, str, str], float]:
return partial(weighted_mean, value_col=value_col, weight_col=weight_col)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment