-
-
Save vishalkuo/f4aec300cf6252ed28d3 to your computer and use it in GitHub Desktop.
import numpy as np | |
def removeOutliers(x, outlierConstant): | |
a = np.array(x) | |
upper_quartile = np.percentile(a, 75) | |
lower_quartile = np.percentile(a, 25) | |
IQR = (upper_quartile - lower_quartile) * outlierConstant | |
quartileSet = (lower_quartile - IQR, upper_quartile + IQR) | |
resultList = [] | |
for y in a.tolist(): | |
if y >= quartileSet[0] and y <= quartileSet[1]: | |
resultList.append(y) | |
return resultList | |
How can this piece of code be adopted for a dataframe? to drop values across the dataframe
Thanks for posting, I need this code for dataframe too. I will try to modify it for my case .
How do I decide what the constant is ?
Hi, here is my suggestion to take advantage of numpy's speed instead of a python loop with a growing list. With big arrays the difference in time is noticeable.
def removeOutliers(x, outlierConstant):
a = np.array(x)
upper_quartile = np.percentile(a, 75)
lower_quartile = np.percentile(a, 25)
IQR = (upper_quartile - lower_quartile) * outlierConstant
quartileSet = (lower_quartile - IQR, upper_quartile + IQR)
result = a[np.where((a >= quartileSet[0]) & (a <= quartileSet[1]))]
return result.tolist()
Thanks, @adrian-alberto! Updated
Did you mean 0.25 and 0.75 rather than 25 and 75? Percentiles go from 0 to 100. Thanks for the code.
@marcoruizrueda
What you are talking about are quantiles.
0 quartile = 0 quantile = 0 percentile
1 quartile = 0.25 quantile = 25 percentile
2 quartile = .5 quantile = 50 percentile (median)
3 quartile = .75 quantile = 75 percentile
4 quartile = 1 quantile = 100 percentile
what is outlier constant?
Thanks, @adrian-alberto! Updated