Created
February 10, 2023 17:37
-
-
Save Ogaday/8d96bb8ab54f596c15463eb07e3a1558 to your computer and use it in GitHub Desktop.
Weighted mean for Pandas dataframes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Utilities for weighted means on DataFrames. | |
| """ | |
| from functools import partial | |
| from typing import Callable | |
| import pandas as pd | |
| def weighted_mean(frame: pd.DataFrame, value_col: str, weight_col: str) -> float: | |
| """Calculate the weighted mean of two columns in a dataframe. | |
| Parameters | |
| ---------- | |
| frame | |
| Tabular data on which to calculate the weighted mean. | |
| value_col | |
| The name of the column of which to take the mean. | |
| weight_col | |
| The name of the column with which to condition the mean. | |
| Returns | |
| ------- | |
| float | |
| The mean of the value column, weighted by the weight column. | |
| Notes | |
| ----- | |
| The mean of ``A`` weighted by ``B`` is as follows:: | |
| sum(A * B) / sum(B) | |
| Where ``*`` is the elementwise multiplication operator. | |
| Examples | |
| -------- | |
| >>> df = pd.DataFrame( | |
| ... [[23, 32], [15, 29], [24, 30], [1, 1000]], | |
| ... columns=["items_sold", "price"] | |
| ... ) | |
| >>> df | |
| items_sold price | |
| 0 23 32 | |
| 1 15 29 | |
| 2 24 30 | |
| 3 1 1000 | |
| If you want to find the average price of sold items, you can't simply take | |
| the mean of the ``price`` column as the one high price item skews the mean: | |
| >>> df["price"].mean() | |
| 272.75 | |
| Instead, you need to weight the price by number of items sold ie. the | |
| ``items_sold`` column: | |
| >>> round( | |
| ... df.pipe(weighted_mean, value_col="price", weight_col="items_sold"), | |
| ... 2 | |
| ... ) | |
| 45.89 | |
| """ | |
| return (frame[value_col] * frame[weight_col]).sum() / frame[weight_col].sum() | |
| def weighted_mean_factory( | |
| value_col, weight_col | |
| ) -> Callable[[pd.DataFrame, str, str], float]: | |
| return partial(weighted_mean, value_col=value_col, weight_col=weight_col) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment