Skip to content

Instantly share code, notes, and snippets.

View armgilles's full-sized avatar
🎯
Focusing

GILLES Armand armgilles

🎯
Focusing
View GitHub Profile
@armgilles
armgilles / IRIS.md
Last active August 29, 2015 14:20
Rappel sur les IRIS INSEE

Les IRIS

  • Les communes d'au moins 10 000 habitants et une forte proportion des communes de 5 000 à 10000 habitants sont découpées en Iris. Ce découpage constitue une partition de leur territoire. La France compte environ 16 100 Iris dont 650 dans les DOM.

  • 3 types d'IRIS :

    • Les Iris d'habitat : leur population se situe en général entre 1 800 et 5 000 habitants. Ils sont homogènes quant au type d'habitat et leurs limites s'appuient sur les grandes coupures du tissu urbain (voies principales, voies ferrées, cours d'eau...).
@armgilles
armgilles / departement_CC_lambert.csv
Last active August 29, 2015 14:20
Correspondance entre les département et les zone CC Lambert
Departement lambert_cc
01 46
02 49
03 46
04 44
05 45
06 44
07 45
08 50
09 43
@armgilles
armgilles / first-git
Created April 29, 2015 09:48
Lancer un nouveau repo git
git init
git add -all
git commit -m "First commit"
git remote add Name Adresse
git push -u Name master
@armgilles
armgilles / optimal_bin_hist.md
Created May 12, 2015 18:25
Looking for optimal bin for a histogram

sturges = lambda n: int(log2(n) + 1) square_root = lambda n: int(sqrt(n)) from scipy.stats import kurtosis doanes = lambda data: int(1 + log(len(data)) + log(1 + kurtosis(data) * (len(data) / 6.) ** 0.5))

n = len(titanic) sturges(n), square_root(n), doanes(titanic.fare.dropna())

titanic.fare.hist(bins=doanes(titanic.fare.dropna()))

@armgilles
armgilles / looking_best_eps_dbscan.py
Last active August 29, 2015 14:21
searching the number of cluster find by DBSCAN algo by moving eps value
from sklearn.cluster import DBSCAN
import pandas as pd
# You already have your feature in X
dbscan_eps = []
for i in [x / 10.0 for x in range(1, 20, 1)]:
db = DBSCAN(eps=i).fit(X)
n_clusters_ = len(set(db.labels_)) - (1 if -1 in db.labels_ else 0)
print "eps = " +" "+ str(i) +" "+ " cluster = " + str(n_clusters_)
dbscan_eps.append({'eps' : i,
@armgilles
armgilles / looking_best_preference_AffinityPropagation.py
Last active April 5, 2016 15:10
searching the number of cluster find by AffinityPropagation algo by moving preference value (checking if there is convergence and number of iteration to converge)
from sklearn.cluster import AffinityPropagation
import pandas as pd
import sys
import cStringIO
# You already have your feature in X
aff_eps = []
for i in [x for x in range(-50, 0, 5)]:
# To know caputre the output of verbose
tdout_ = sys.stdout #Keep track of the previous value.
@armgilles
armgilles / neuronral_network_pattern.md
Last active August 29, 2015 14:24
List of different pattern. How to use it & why

List of partern for neuronal network :

  1. Autoencoders are simplest ones. They are intuitively understandable, easy to implement and to reason about (e.g. it's much easier to find good meta-parameters for them than for RBMs).
  2. RBMs are generative. That is, unlike autoencoders that only discriminate some data vectors in favour of others, RBMs can also generate new data with given joined distribution. They are also considered more feature-rich and flexible.
  3. CNNs are very specific model that is mostly used for very specific task (though pretty popular task). Most of the top-level algorithms in image recognition are somehow based on CNNs today, but outside that niche they are hardly applicable (e.g. what's the reason to use convolution for film review analysis?).

Autoencoder

Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. E.g. in a network like this