Skip to content

Instantly share code, notes, and snippets.

View GaelVaroquaux's full-sized avatar

Gael Varoquaux GaelVaroquaux

View GitHub Profile
import numpy as np
import pylab as pl
import matplotlib.transforms as mtransforms
################################################################################
# Display correlation matrices
def fit_axes(ax):
""" Redimension the given axes to have labels fitting.
"""
@GaelVaroquaux
GaelVaroquaux / count_3_grams.py
Created October 31, 2017 19:23
Fast 3-gram counting on small strings
"""
Fast counting of 3-grams for short strings.
Quick benchmarking seems to show that pure Python code is faster when
for strings less that 1000 characters, and numpy versions is faster for
longer strings.
Very long strings would benefit from probabilistic counting (bloom
filter, count min sketch) as implemented eg in the "bounter" module.
@GaelVaroquaux
GaelVaroquaux / deconfound.py
Last active July 18, 2021 12:35
Linear deconfounding in a fit-transform API
"""
A scikit-learn like transformer to remove a confounding effect on X.
"""
from sklearn.base import BaseEstimator, TransformerMixin, clone
from sklearn.linear_model import LinearRegression
import numpy as np
class DeConfounder(BaseEstimator, TransformerMixin):
""" A transformer removing the effect of y on X.
@GaelVaroquaux
GaelVaroquaux / impact_encoding.py
Created October 29, 2018 14:19
Target encoding (or impact encoding)
# how to use : df should be the dataframe restricted to categorical values to impact,
# target should be the pd.series of target values.
# Use fit, transform etc.
# three types : binary, multiple, continuous.
# for now m is a param <===== but what should we put here ? I guess some function of total shape.
# I mean what would be the value of m we want to have for 0.5 ?
import pandas as pd
import numpy as np