Last active
December 8, 2024 08:31
-
-
Save CMCDragonkai/c79b9a0883e31b327c88bfadb8b06fc4 to your computer and use it in GitHub Desktop.
Lorenz Curve and Gini Coefficient #python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import matplotlib.pyplot as plt | |
# ensure your arr is sorted from lowest to highest values first! | |
arr = np.array([1,4,6,9,100]) | |
def gini(arr): | |
count = arr.size | |
coefficient = 2 / count | |
indexes = np.arange(1, count + 1) | |
weighted_sum = (indexes * arr).sum() | |
total = arr.sum() | |
constant = (count + 1) / count | |
return coefficient * weighted_sum / total - constant | |
def lorenz(arr): | |
# this divides the prefix sum by the total sum | |
# this ensures all the values are between 0 and 1.0 | |
scaled_prefix_sum = arr.cumsum() / arr.sum() | |
# this prepends the 0 value (because 0% of all people have 0% of all wealth) | |
return np.insert(scaled_prefix_sum, 0, 0) | |
# show the gini index! | |
print(gini(arr)) | |
lorenz_curve = lorenz(arr) | |
# we need the X values to be between 0.0 to 1.0 | |
plt.plot(np.linspace(0.0, 1.0, lorenz_curve.size), lorenz_curve) | |
# plot the straight line perfect equality curve | |
plt.plot([0,1], [0,1]) | |
plt.show() |
Thank you so much, you saved me!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Assuming that the total frequency distribution comes from element-wise summation of individual ratios, using a gini coefficient, can we derive the changes required to the individual ratios in order to make the system more fair and balanced?
Suppose
arr = np.array([1,4,6,9,100])
was acquired from many individual ratios of[0,1,1,2,3], [1,0,1,2,20]
... etc. If the changes are that:How can we make the resulting gini coefficiently perfectly fair? Note that the perverse case is dropping all ratios resulting in a total ratio of
[0,0,0,0,0]
. One could disallow this by setting a minimal number of individual ratios to preserve.This problem would enable class balancing on object detection data.
See: Frame Augmentation for Imbalanced Object Detection Datasets
An alternative to attempting to balance by oversampling object detection images or undersampling object detection images, is to augment with synthetic images that is composited from existing labelled objects together. This relies on the existence of the same scene in which the object may exist. Dealing with rare objects may be complicated.