Last active
March 31, 2021 21:53
-
-
Save vishalkuo/f4aec300cf6252ed28d3 to your computer and use it in GitHub Desktop.
Remove outliers using numpy. Normally, an outlier is outside 1.5 * the IQR experimental analysis has shown that a higher/lower IQR might produce more accurate results. Interestingly, after 1000 runs, removing outliers creates a larger standard deviation between test run results.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
def removeOutliers(x, outlierConstant): | |
a = np.array(x) | |
upper_quartile = np.percentile(a, 75) | |
lower_quartile = np.percentile(a, 25) | |
IQR = (upper_quartile - lower_quartile) * outlierConstant | |
quartileSet = (lower_quartile - IQR, upper_quartile + IQR) | |
resultList = [] | |
for y in a.tolist(): | |
if y >= quartileSet[0] and y <= quartileSet[1]: | |
resultList.append(y) | |
return resultList | |
@marcoruizrueda
What you are talking about are quantiles.
0 quartile = 0 quantile = 0 percentile
1 quartile = 0.25 quantile = 25 percentile
2 quartile = .5 quantile = 50 percentile (median)
3 quartile = .75 quantile = 75 percentile
4 quartile = 1 quantile = 100 percentile
what is outlier constant?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Did you mean 0.25 and 0.75 rather than 25 and 75? Percentiles go from 0 to 100. Thanks for the code.