Skip to content

Instantly share code, notes, and snippets.

View ogrisel's full-sized avatar

Olivier Grisel ogrisel

View GitHub Profile
distributed.worker - WARNING - Worker at 72 percent memory usage. Trigger GC. Process memory: 723.75 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 66 percent memory usage. After GC. Process memory: 660.93 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 73 percent memory usage. Trigger GC. Process memory: 732.79 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 73 percent memory usage. After GC. Process memory: 732.79 MB -- Worker memory limit: 1000.00 MB
distributed.core - WARNING - Event loop was unresponsive for 1.01s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.worker - WARNING - Worker at 70 percent memory usage. Trigger GC. Process memory: 705.26 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 67 percent memory usage. After GC. Process memory: 670.00 MB -- Worker memory limit: 1000.0
@ogrisel
ogrisel / mean_target_encoding.py
Last active July 7, 2018 04:31
Mean target value encoding for categorical variable using dask (take 2)
import os
import os.path as op
from time import time
import dask.dataframe as ddf
import dask.array as da
from distributed import Client
def make_categorical_data(n_samples=int(1e7), n_features=10, n_partitions=100):
"""Generate some random categorical data
@ogrisel
ogrisel / mean_target_encoding.py
Last active September 29, 2017 15:05
Mean target value encoding for categorical variable using dask
#
# XXX: do not use this code, it's broken!
# Use: https://gist.github.com/ogrisel/b6a97ed87939e3b559568ac2f6599cba
#
# See comments.
import os
import os.path as op
from time import time
import dask.dataframe as ddf
@ogrisel
ogrisel / .gitignore
Last active August 30, 2017 12:00
roofline analysis
__pycache__
*.json
*.png
_________________ TestsProcessPoolLokyExecutor.test_max_depth __________________
self = <tests.test_process_executor_loky.TestsProcessPoolLokyExecutor instance at 0x10392ed88>
def test_max_depth(self):
from loky.process_executor import MAX_DEPTH
if self.context.get_start_method() == 'fork':
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_low_rank_matrix
from sklearn.linear_model import lasso_path
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import scale
from time import time
from rpy2 import robjects
import rpy2.robjects.packages as rpackages
from joblib import Parallel, delayed, parallel_backend
from loky.reusable_executor import get_reusable_executor
from multiprocessing import Pool
from time import sleep, time
from itertools import repeat
import os
n_workers = 4
n_iter = int(1e2)
delay = 1e-6
iter: 0 | TL: 2.694 | VL: 1.504 | Vacc: 0.492 | Ratio: 1.79 | Time: 158.2
iter: 1 | TL: 2.222 | VL: 1.117 | Vacc: 0.618 | Ratio: 1.99 | Time: 157.5
iter: 2 | TL: 1.999 | VL: 1.075 | Vacc: 0.65 | Ratio: 1.86 | Time: 157.1
iter: 3 | TL: 1.85 | VL: 0.882 | Vacc: 0.701 | Ratio: 2.1 | Time: 157.4
iter: 4 | TL: 1.725 | VL: 0.74 | Vacc: 0.755 | Ratio: 2.33 | Time: 157.2
iter: 5 | TL: 1.624 | VL: 0.677 | Vacc: 0.775 | Ratio: 2.4 | Time: 157.2
iter: 6 | TL: 1.549 | VL: 0.674 | Vacc: 0.773 | Ratio: 2.3 | Time: 157.2
iter: 7 | TL: 1.474 | VL: 0.679 | Vacc: 0.774 | Ratio: 2.17 | Time: 157.2
iter: 8 | TL: 1.422 | VL: 0.533 | Vacc: 0.822 | Ratio: 2.67 | Time: 157.3
iter: 9 | TL: 1.368 | VL: 0.513 | Vacc: 0.834 | Ratio: 2.67 | Time: 157.3
@ogrisel
ogrisel / spacy_openmp_and_multiprocessing.py
Created May 22, 2016 13:38
This script highlight that the use openmp in conjunction with openmp can cause the OpenMP runtime to crash as documented in https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries
from joblib import Parallel, delayed
import spacy
en_nlp = spacy.load('en')
texts = [u'Here is a sentence.'] * 100
def do_openmp_stuff(texts):
list(en_nlp.pipe(texts, n_threads=4, batch_size=10))