Skip to content

Instantly share code, notes, and snippets.

View glemaitre's full-sized avatar

Guillaume Lemaitre glemaitre

View GitHub Profile
@glemaitre
glemaitre / sprint_tags.md
Last active June 3, 2017 11:42
Issues and PRs which need some love
"""
This is real case using the data of the Adult Census dataset available at:
https://archive.ics.uci.edu/ml/datasets/Adult
It will show that adding a smoothing noise do not has any influence on the
classification performance but allow for a better understanding when manually
checking the QuantileTransformer.
"""
import numpy as np
import pandas as pd
import numpy as np
from sklearn.preprocessing import QuantileTransformer
X = np.array([0] * 1 + [0.5] * 7 + [1] * 2).reshape(-1, 1)
qt = QuantileTransformer(n_quantiles=10)
qt.fit(X)
# a behaviour which is not desired, but that frankly should

Test script

from __future__ import division, print_function                                                 
                                                                                                
import platform                                                                                 
import sys                                                                                      
                                                                                                
from time import time                                                                           
                                                                                                
python examples/model_selection/grid_search_text_feature_extraction.py 

==========================================================
Sample pipeline for text feature extraction and evaluation
==========================================================

The dataset used in this example is the 20 newsgroups dataset which will be
automatically downloaded and then cached and reused for the document
classification example.