Gael Varoquaux GaelVaroquaux

Research director at Inria. Computer science, data science, health. Scipy & pydata coder; (co)funder of scikit-learn, joblib, probabl.

GaelVaroquaux / matrix_plotting.py

Last active March 3, 2017 11:32

	import numpy as np
	import pylab as pl
	import matplotlib.transforms as mtransforms

	################################################################################
	# Display correlation matrices

	def fit_axes(ax):
	""" Redimension the given axes to have labels fitting.
	"""

GaelVaroquaux / count_3_grams.py

Created October 31, 2017 19:23

Fast 3-gram counting on small strings

	"""
	Fast counting of 3-grams for short strings.


	Quick benchmarking seems to show that pure Python code is faster when
	for strings less that 1000 characters, and numpy versions is faster for
	longer strings.

	Very long strings would benefit from probabilistic counting (bloom
	filter, count min sketch) as implemented eg in the "bounter" module.

GaelVaroquaux / deconfound.py

Last active July 18, 2021 12:35

Linear deconfounding in a fit-transform API

	"""
	A scikit-learn like transformer to remove a confounding effect on X.
	"""

	from sklearn.base import BaseEstimator, TransformerMixin, clone
	from sklearn.linear_model import LinearRegression
	import numpy as np

	class DeConfounder(BaseEstimator, TransformerMixin):
	""" A transformer removing the effect of y on X.

GaelVaroquaux / impact_encoding.py

Created October 29, 2018 14:19

Target encoding (or impact encoding)

	# how to use : df should be the dataframe restricted to categorical values to impact,
	# target should be the pd.series of target values.
	# Use fit, transform etc.
	# three types : binary, multiple, continuous.
	# for now m is a param <===== but what should we put here ? I guess some function of total shape.
	# I mean what would be the value of m we want to have for 0.5 ?

	import pandas as pd
	import numpy as np