This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import numpy as np | |
| import pylab as pl | |
| import matplotlib.transforms as mtransforms | |
| ################################################################################ | |
| # Display correlation matrices | |
| def fit_axes(ax): | |
| """ Redimension the given axes to have labels fitting. | |
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Fast counting of 3-grams for short strings. | |
| Quick benchmarking seems to show that pure Python code is faster when | |
| for strings less that 1000 characters, and numpy versions is faster for | |
| longer strings. | |
| Very long strings would benefit from probabilistic counting (bloom | |
| filter, count min sketch) as implemented eg in the "bounter" module. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| A scikit-learn like transformer to remove a confounding effect on X. | |
| """ | |
| from sklearn.base import BaseEstimator, TransformerMixin, clone | |
| from sklearn.linear_model import LinearRegression | |
| import numpy as np | |
| class DeConfounder(BaseEstimator, TransformerMixin): | |
| """ A transformer removing the effect of y on X. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # how to use : df should be the dataframe restricted to categorical values to impact, | |
| # target should be the pd.series of target values. | |
| # Use fit, transform etc. | |
| # three types : binary, multiple, continuous. | |
| # for now m is a param <===== but what should we put here ? I guess some function of total shape. | |
| # I mean what would be the value of m we want to have for 0.5 ? | |
| import pandas as pd | |
| import numpy as np |
OlderNewer