Created
February 15, 2017 18:18
-
-
Save raghavrv/55509528ecce3f32404f8c0d907e14d8 to your computer and use it in GitHub Desktop.
Generating artificial dataset with outliers
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
from sklearn.datasets import make_classification | |
# Data with features in different scales | |
n_classes = 2 | |
X_clean, y_clean = make_classification( | |
n_samples=500, n_features=2, n_redundant=0, | |
scale=(10, 100), random_state=0) | |
# Add outliers to the data | |
X_outliers, y_outliers = make_classification( | |
n_samples=10, n_features=2, n_redundant=0, | |
# scale=(10, 100), random_state=1) | |
scale=(-100, 1000), random_state=1) | |
X = np.concatenate((X_clean, X_outliers), axis=0) | |
y = np.concatenate((y_clean, y_outliers), axis=0) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment