Skip to content

Instantly share code, notes, and snippets.

View ogrisel's full-sized avatar

Olivier Grisel ogrisel

View GitHub Profile
@ogrisel
ogrisel / non_degenerate_mlp_gram.py
Last active March 8, 2022 22:30
Spectrum of the extended feature Gram matrix of an single hidden layer ReLU MLP
"""Empirical evaluation of the extended feature Gram matrix of a ReLU MLP
Here we try to estimate the spectrum of the H^\infty matrix as defined in:
Gradient Descent Provably Optimizes Over-parameterized Neural Networks (2018)
Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh
https://arxiv.org/abs/1810.02054
Theorem 4.1 relies on the assumption that H^\infty has a strictly positive
minimum eigenvalue. The following computes an estimate of this eigenvalue
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.externals import joblib
m = joblib.Memory(cachedir='/tmp/joblib')
make_blobs = m.cache(make_blobs)
data, labels = make_blobs(n_samples=10**5, n_features=50, cluster_std=100,
centers=10, random_state=777)
@ogrisel
ogrisel / numpy_pickle_protocol_5.py
Last active October 13, 2019 09:17
Draft use of pickle protocol 5 (PEP 574) for zero-copy numpy array pickling
from pickle import Pickler, load
try:
from pickle import PickleBuffer
except ImportError:
PickleBuffer = None
import copyreg
import os
import numpy as np
import time
@ogrisel
ogrisel / large_pickle_dump.py
Last active April 20, 2018 09:06
Memory profiling for Python pickling of large buffers
from pickle import Pickler, _Pickler, Unpickler, _Unpickler, HIGHEST_PROTOCOL
import os
import time
import sys
import gc
from multiprocessing import get_context
PROTOCOL = HIGHEST_PROTOCOL
ctx = get_context('spawn')
distributed.worker - WARNING - Worker at 72 percent memory usage. Trigger GC. Process memory: 723.75 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 66 percent memory usage. After GC. Process memory: 660.93 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 73 percent memory usage. Trigger GC. Process memory: 732.79 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 73 percent memory usage. After GC. Process memory: 732.79 MB -- Worker memory limit: 1000.00 MB
distributed.core - WARNING - Event loop was unresponsive for 1.01s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.worker - WARNING - Worker at 70 percent memory usage. Trigger GC. Process memory: 705.26 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 67 percent memory usage. After GC. Process memory: 670.00 MB -- Worker memory limit: 1000.0
@ogrisel
ogrisel / mean_target_encoding.py
Last active July 7, 2018 04:31
Mean target value encoding for categorical variable using dask (take 2)
import os
import os.path as op
from time import time
import dask.dataframe as ddf
import dask.array as da
from distributed import Client
def make_categorical_data(n_samples=int(1e7), n_features=10, n_partitions=100):
"""Generate some random categorical data
@ogrisel
ogrisel / mean_target_encoding.py
Last active September 29, 2017 15:05
Mean target value encoding for categorical variable using dask
#
# XXX: do not use this code, it's broken!
# Use: https://gist.github.com/ogrisel/b6a97ed87939e3b559568ac2f6599cba
#
# See comments.
import os
import os.path as op
from time import time
import dask.dataframe as ddf
@ogrisel
ogrisel / .gitignore
Last active August 30, 2017 12:00
roofline analysis
__pycache__
*.json
*.png
_________________ TestsProcessPoolLokyExecutor.test_max_depth __________________
self = <tests.test_process_executor_loky.TestsProcessPoolLokyExecutor instance at 0x10392ed88>
def test_max_depth(self):
from loky.process_executor import MAX_DEPTH
if self.context.get_start_method() == 'fork':
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.