Last active
September 29, 2019 05:20
-
-
Save AFAgarap/621f901199978c4571e54921a9f0f151 to your computer and use it in GitHub Desktop.
Filter out data points with low kNN density. Link to blog: https://towardsdatascience.com/how-can-i-trust-you-fb433a06256c?source=friends_link&sk=0af208dc53be2a326d2407577184686b
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def filter_by_distance_knn(self, X: np.ndarray) -> np.ndarray: | |
| kdtree = KDTree(X, leaf_size=self.leaf_size, metric=self.metric) | |
| knn_r = kdtree.query( | |
| X, k=self.k_filter + 1 | |
| )[0] # distances from 0 to k-nearest points | |
| if self.dist_filter_type == 'point': | |
| knn_r = knn_r[:, -1] | |
| elif self.dist_filter_type == 'mean': | |
| knn_r = np.mean( | |
| knn_r[:, 1:], axis=1 | |
| ) # exclude distance of instance to itself | |
| cutoff_r = np.percentile( | |
| knn_r, (1 - self.alpha) * 100 | |
| ) # cutoff distance | |
| X_keep = X[np.where(knn_r <= cutoff_r)[0], :] # define instances to keep | |
| return X_keep |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment