Skip to content

Instantly share code, notes, and snippets.

View raghavrv's full-sized avatar

(Venkat) Raghav, Rajagopalan raghavrv

View GitHub Profile
@raghavrv
raghavrv / gold.png
Last active March 12, 2017 18:07
github colored icons
gold.png
@raghavrv
raghavrv / green.ico
Last active March 12, 2017 18:18
Github colored favicons
@raghavrv
raghavrv / dataset_with_outliers.py
Created February 15, 2017 18:18
Generating artificial dataset with outliers
import numpy as np
from sklearn.datasets import make_classification
# Data with features in different scales
n_classes = 2
X_clean, y_clean = make_classification(
n_samples=500, n_features=2, n_redundant=0,
scale=(10, 100), random_state=0)
# Add outliers to the data
@raghavrv
raghavrv / gbcv_vs_gscv_comparison.py
Created January 30, 2017 16:21
Performance comparison of GradientBoostingCV vs GridSearchCV for searching best number of boosting stages (n_estimators)
"""
===============================
Gradient Boosting Classifier CV
===============================
Gradient boosting is an ensembling technique where several weak learners
(regression trees) are combined to yield a powerful single model, in an
iterative fashion.
:class:`sklearn.ensemble.GradientBoostingClassifierCV` enables us to
@raghavrv
raghavrv / interview.md
Last active January 4, 2017 15:45
Interview questions and answers for Pratheeban
Question 1

Sequence is defined by -

  • A0 = 1
  • Ai+1 = Ai + 1 or Ai * 2

Now given An = some number find the smallest n.

def find_smallest_n(An):
@raghavrv
raghavrv / string_int_items_sum.py
Created December 23, 2016 09:30
Interview question for pratheeban
# Write a function that takes a list of string and returns the sum of the list items that represents an integer.
def sum_up_int_terms(string):
sum = 0
for item in string:
try:
sum += int(item)
except:
pass
return sum
@raghavrv
raghavrv / value_dropper.py
Last active December 12, 2016 17:03
Alternative (simplified?) api for dropping values (NMAR and MCAR)
import numpy as np
from sklearn.utils.validation import check_random_state
from sklearn.externals import six
from functools import partial
def mcar_mask(X, y=None, proba=0.1, random_state=None):
"""Generate a MCAR mask to uniformly drop values
@raghavrv
raghavrv / drop_values.py
Created December 12, 2016 16:13
value dropper alternative api
import numpy as np
from sklearn.utils.validation import check_random_state
from sklearn.externals import six
from functools import partial
def drop_values_mcar(X, y=None, missing_values=np.nan,
proba=0.1, random_state=None):
@raghavrv
raghavrv / value_dropper_doc.md
Created November 4, 2016 16:23
Value dropper design for Alex
missing_rate : float in range [0, 1), default 0.1
    The absolute fraction of values that must be missing.

    If there are previously existing missing values (say x fraction of
    values were already missing), an additional ``missing_rate - x``
    fraction of values will be dropped further to achieve a total of
    ``missing_rate`` fraction of missing values.

    That is, at the end, the dataset will contain a total of

(X.shape[0] * X.shape[1]) * missing_rate numbers of missing values.

@raghavrv
raghavrv / automations.sh
Created November 3, 2016 13:33
Some automation aliases and scripts for Telecom Paristech desktop / TSI server
export LANG=en_US.UTF-8
# COMMON CONFS
# Use pgrep with full command path and display the name
alias checktuns="ps x | grep 'ssh -N'"
alias pgrep="ps x | grep "
# Set the scikit path env var
export SCIKIT_LEARN_PATH=~/raghav/code/scikit-learn