Introductory quote:
"Machine learning people use hugely complex algorithms on trivially simple datasets. Biology does trivially simple algorithms on hugely complex datasets."
- Replicability
| import numpy as np | |
| import matplotlib.pyplot as plt | |
| from sklearn.linear_model import Lasso, lars_path | |
| np.random.seed(42) | |
| def gen_data(n, m, k): | |
| X = np.random.randn(n, m) | |
| w = np.zeros((m, 1)) | |
| i = np.arange(0, m) |
| """ | |
| Uses C++ map containers for fast dict-like behavior with keys being | |
| integers, and values float. | |
| """ | |
| # Author: Gael Varoquaux | |
| # License: BSD | |
| # XXX: this needs Cython 17.1 or later. Elsewhere you will get a C++ compilation error. | |
| import numpy as np |
| import numpy as np | |
| import time | |
| from sklearn import cluster | |
| from sklearn import datasets | |
| lfw = datasets.fetch_lfw_people() | |
| X_lfw = lfw.data[:, :5] | |
| eps = 8. # This choice of EPS gives 44 clusters |
| This gist is only meant for discussion. |
| ### Keybase proof | |
| I hereby claim: | |
| * I am GaelVaroquaux on github. | |
| * I am gaelvaroquaux (https://keybase.io/gaelvaroquaux) on keybase. | |
| * I have a public key whose fingerprint is 44B8 B843 6321 47EB 59A9 8992 6C52 6A43 ABE0 36FC | |
| To claim this, I am signing this object: |
| ''' | |
| Non-parametric computation of entropy and mutual-information | |
| Adapted by G Varoquaux for code created by R Brette, itself | |
| from several papers (see in the code). | |
| This code is maintained at https://github.com/mutualinfo/mutual_info | |
| Please download the latest code there, to have improvements and | |
| bug fixes. |
| import numpy as np | |
| import matplotlib.pyplot as plt | |
| from sklearn.linear_model import Ridge, Lasso | |
| from sklearn.cross_validation import ShuffleSplit | |
| from sklearn.grid_search import GridSearchCV | |
| from sklearn.utils import check_random_state | |
| from sklearn import datasets | |
| """Persistence strategies comparison script. | |
| This script compute the speed, memory used and disk space used when dumping and | |
| loading arbitrary data. The data are taken among: | |
| - scikit-learn Labeled Faces in the Wild dataset (LFW) | |
| - a fully random numpy array with 10000x10000 shape | |
| - a dictionary with 1M random keys/values | |
| - a list containing 10M random value | |
| The compared persistence strategies are: |