Skip to content

Instantly share code, notes, and snippets.

@aronwc
aronwc / cluster.py
Created July 12, 2013 16:17
Example of running k-means clustering with scikit-learn.
# cluster users based on feature vectors
import argparse
import io
import numpy as np
import pickle
import re
import string
import sys
from sklearn.cluster import MiniBatchKMeans
@aronwc
aronwc / python-emacs
Last active December 20, 2015 00:59
Writing Python with Emacs
Here are the steps I took to make Emacs have code-completion and PEP8 checking:
1. Install [Marmalade](http://marmalade-repo.org) by putting this in your .emacs
(require 'package)
(add-to-list 'package-archives
'("marmalade" .
"http://marmalade-repo.org/packages/"))
(package-initialize)
@aronwc
aronwc / tasks.py
Created July 29, 2013 22:12
Celery example
''' An example of how to use Celery to manage a mix of serial and parallel
tasks. This depends on a running instance of a rabbitmq messaging server to
keep track of task statuses. This can be launched on our ec2 instance with:
~/rabbitmq/rabbitmq_server-3.1.3/sbin/rabbitmq-server
For this script to work, you first need to run a celery worker process to
await orders:
$ celery -A tasks worker --loglevel=info
Then, you can call any of the functions below (see main for an example).
@aronwc
aronwc / l1.py
Last active December 31, 2015 20:58
L1 feature selection example
import numpy as np
from sklearn import linear_model
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
def print_features(coef, names):
""" Print sorted list of non-zero features/weights. """
print "\n".join('%s/%.2f' % (names[j], coef[j]) for j in np.argsort(coef)[::-1] if coef[j] != 0)
@aronwc
aronwc / lda.py
Last active April 30, 2024 06:54
Example using GenSim's LDA and sklearn
""" Example using GenSim's LDA and sklearn. """
import numpy as np
from gensim import matutils
from gensim.models.ldamodel import LdaModel
from sklearn import linear_model
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
# show that max_depth affects floating point precision of predict_proba in RandomForest
from collections import Counter
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=20,
n_informative=5, n_redundant=10,
random_state=42)