-
-
Save CMCDragonkai/c79b9a0883e31b327c88bfadb8b06fc4 to your computer and use it in GitHub Desktop.
import numpy as np | |
import matplotlib.pyplot as plt | |
# ensure your arr is sorted from lowest to highest values first! | |
arr = np.array([1,4,6,9,100]) | |
def gini(arr): | |
count = arr.size | |
coefficient = 2 / count | |
indexes = np.arange(1, count + 1) | |
weighted_sum = (indexes * arr).sum() | |
total = arr.sum() | |
constant = (count + 1) / count | |
return coefficient * weighted_sum / total - constant | |
def lorenz(arr): | |
# this divides the prefix sum by the total sum | |
# this ensures all the values are between 0 and 1.0 | |
scaled_prefix_sum = arr.cumsum() / arr.sum() | |
# this prepends the 0 value (because 0% of all people have 0% of all wealth) | |
return np.insert(scaled_prefix_sum, 0, 0) | |
# show the gini index! | |
print(gini(arr)) | |
lorenz_curve = lorenz(arr) | |
# we need the X values to be between 0.0 to 1.0 | |
plt.plot(np.linspace(0.0, 1.0, lorenz_curve.size), lorenz_curve) | |
# plot the straight line perfect equality curve | |
plt.plot([0,1], [0,1]) | |
plt.show() |
Assuming that the total frequency distribution comes from element-wise summation of individual ratios, using a gini coefficient, can we derive the changes required to the individual ratios in order to make the system more fair and balanced?
Suppose arr = np.array([1,4,6,9,100])
was acquired from many individual ratios of [0,1,1,2,3], [1,0,1,2,20]
... etc. If the changes are that:
- We can duplicate an individual ratio
- We can drop a ratio
How can we make the resulting gini coefficiently perfectly fair? Note that the perverse case is dropping all ratios resulting in a total ratio of [0,0,0,0,0]
. One could disallow this by setting a minimal number of individual ratios to preserve.
This problem would enable class balancing on object detection data.
See: Frame Augmentation for Imbalanced Object Detection Datasets
An alternative to attempting to balance by oversampling object detection images or undersampling object detection images, is to augment with synthetic images that is composited from existing labelled objects together. This relies on the existence of the same scene in which the object may exist. Dealing with rare objects may be complicated.
Thank you so much, you saved me!
Thank you for your explanations!