ogrisel’s gists

ogrisel / gist:a1e94aaaab7bb73409d0e2892b1f9ed0

Created September 28, 2020 13:08

	In [19]: from sklearn.preprocessing import StandardScaler

	In [20]: from sklearn.linear_model import LogisticRegression

	In [21]: from sklearn.pipeline import Pipeline

	In [22]: p = Pipeline([("scaler", StandardScaler()), ("classifier", LogisticRegression())])

	In [23]: import numpy as np

ogrisel / categorical_hist_gbrt.py

Last active August 10, 2020 14:53

	from sklearn.model_selection import cross_validate
	import matplotlib.pyplot as plt
	import numpy as np

	from sklearn.datasets import fetch_openml
	from sklearn.experimental import enable_hist_gradient_boosting # noqa
	from sklearn.ensemble import HistGradientBoostingClassifier
	from sklearn.pipeline import make_pipeline
	from sklearn.compose import make_column_transformer
	from sklearn.compose import make_column_selector

ogrisel / check_roc_auc.py

Last active June 24, 2020 14:30

	import numpy as np
	import pytest
	from sklearn.datasets import load_breast_cancer
	from sklearn.utils import shuffle
	from sklearn.model_selection import train_test_split
	from sklearn.model_selection import GridSearchCV
	from sklearn.linear_model import LogisticRegression
	from sklearn.metrics import roc_auc_score, roc_curve

ogrisel / test.py

Created June 2, 2020 15:03

	@pytest.mark.parametrize("loss", ['huber', 'ls', 'lad', 'quantile'])
	@pytest.mark.parametrize("use_sample_weight", [False, True])
	def test_regressor_train_loss_convergence(loss, use_sample_weight):
	rng = np.random.RandomState(42)
	n_samples, n_features = 30, 5
	n_estimators = 300

	# Make random data (without duplicated samples) to make sure
	# it's possible to build an invertible (overfitting) mapping
	# from X to y that therefore should lead to a regression loss

ogrisel / conda_forge_compilers_macos_buildlog.txt

Created September 13, 2019 17:36

macos build log

	(conda-forge-compilers) 0 [~/code/scikit-learn (master)]$ pip install -e . -v
	Created temporary directory: /private/var/folders/69/7jxl92h50w10b4v998qt4tj00000gn/T/pip-ephem-wheel-cache-cn0u3xn5
	Created temporary directory: /private/var/folders/69/7jxl92h50w10b4v998qt4tj00000gn/T/pip-req-tracker-7xtixh31
	Created requirements tracker '/private/var/folders/69/7jxl92h50w10b4v998qt4tj00000gn/T/pip-req-tracker-7xtixh31'
	Created temporary directory: /private/var/folders/69/7jxl92h50w10b4v998qt4tj00000gn/T/pip-install-q8mggn78
	Obtaining file:///Users/ogrisel/code/scikit-learn
	Added file:///Users/ogrisel/code/scikit-learn to build tracker '/private/var/folders/69/7jxl92h50w10b4v998qt4tj00000gn/T/pip-req-tracker-7xtixh31'
	Running setup.py (path:/Users/ogrisel/code/scikit-learn/setup.py) egg_info for package from file:///Users/ogrisel/code/scikit-learn
	Running command python setup.py egg_info
	running egg_info

ogrisel / halving_adult_census.py

Created August 28, 2019 11:27

	from time import time
	from pprint import pprint

	import numpy as np
	import pandas as pd
	from scipy.stats import expon, randint, uniform

	from sklearn.pipeline import Pipeline
	from sklearn.compose import ColumnTransformer
	from sklearn.preprocessing import OrdinalEncoder

ogrisel / debug_hist_gbdt_missing_values.ipynb

Last active July 18, 2019 15:52

debug missing values for hist GBDT

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

ogrisel / ms-python-server.log

Created January 9, 2019 09:54

Microsoft Python Language Server version 0.1.75.0 on scikit-learn

	Starting Microsoft Python language server.
	##########Linting Output - flake8##########
	Microsoft Python Language Server version 0.1.75.0
	Initializing for /opt/venvs/py37/bin/python
	Loading files from /home/ogrisel/code/scikit-learn
	Parsing document file:///home/ogrisel/code/scikit-learn/setup.py
	Parse complete for file:///home/ogrisel/code/scikit-learn/setup.py at version -1
	Analysis queued for file:///home/ogrisel/code/scikit-learn/setup.py
	Parsing document file:///home/ogrisel/code/scikit-learn/conftest.py
	Parse complete for file:///home/ogrisel/code/scikit-learn/conftest.py at version -1

ogrisel / non_degenerate_mlp_gram.py

Last active March 8, 2022 22:30

Spectrum of the extended feature Gram matrix of an single hidden layer ReLU MLP

	"""Empirical evaluation of the extended feature Gram matrix of a ReLU MLP

	Here we try to estimate the spectrum of the H^\infty matrix as defined in:

	Gradient Descent Provably Optimizes Over-parameterized Neural Networks (2018)
	Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh
	https://arxiv.org/abs/1810.02054

	Theorem 4.1 relies on the assumption that H^\infty has a strictly positive
	minimum eigenvalue. The following computes an estimate of this eigenvalue

ogrisel / kmeans_benchmark.py

Created July 14, 2018 16:29

	from sklearn.datasets import make_blobs
	from sklearn.cluster import KMeans
	from sklearn.externals import joblib


	m = joblib.Memory(cachedir='/tmp/joblib')
	make_blobs = m.cache(make_blobs)
	data, labels = make_blobs(n_samples=10**5, n_features=50, cluster_std=100,
	centers=10, random_state=777)

Olivier Grisel ogrisel