Skip to content

Instantly share code, notes, and snippets.

View ddofer's full-sized avatar

Dan Ofer ddofer

View GitHub Profile
@syhw
syhw / dropout_simple_models.py
Created July 17, 2014 14:55
Trying dropout with simple off-the-selves scikit-learn models. Not really working.
from sklearn.datasets import fetch_20newsgroups, load_digits
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cross_validation import train_test_split
import numpy as np
from sklearn.naive_bayes import MultinomialNB, BernoulliNB
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn import metrics
newsgroups_train = fetch_20newsgroups(subset='train')
vectorizer = TfidfVectorizer(encoding='latin-1', max_features=10000)
@syhw
syhw / dnn_compare_optims.py
Created July 21, 2014 09:07
comparing SGD vs SAG vs Adadelta vs Adagrad
"""
A deep neural network with or w/o dropout in one file.
"""
import numpy
import theano
import sys
import math
from theano import tensor as T
from theano import shared
@bsweger
bsweger / useful_pandas_snippets.md
Last active April 4, 2025 21:20
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@syhw
syhw / dnn.py
Last active October 19, 2024 08:20
A simple deep neural network with or w/o dropout in one file.
"""
A deep neural network with or w/o dropout in one file.
License: Do What The Fuck You Want to Public License http://www.wtfpl.net/
"""
import numpy, theano, sys, math
from theano import tensor as T
from theano import shared
from theano.tensor.shared_randomstreams import RandomStreams
@roycoding
roycoding / forest.md
Last active January 23, 2018 08:05
Beat the Becnhmark: Forest Cover Type Prediction

Beating the Forest Cover Type Prediction benchmark

Day 4 of the Beat 5 Kaggle Benchmarks in 5 Days challenge

For the Forest Cover Type Prediction competition on Kaggle, the goal is to predict the predominant type of trees in a given section of forest. The score is based on average classification accuracy for the 7 different tree cover classes.

To beat the all fir/spruce benchmark I obviously tried a random forest. Using the default settings of scikit-learn's RandomForestClassifier, I was able to beat the benchmark with an accuracy score of 0.72718 on the competition leaderboard. By using 100 estimators (versus the default of 10), I was able to raise that accuracy score up to 0.75455.

Random Forest Cover Types

Using pandas I loaded the train and test data sets into Python.

@larsmans
larsmans / supervised_tf.py
Created October 9, 2014 11:19
Supervised tf (tf-chi², tf-rf) for scikit-learn
import numpy as np
#from scipy.special import chdtrc
from scipy.sparse import spdiags
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.preprocessing import LabelBinarizer
def _chisquare(f_obs, f_exp, reduce):
"""Replacement for scipy.stats.chisquare with custom reduction.
import seaborn as sns
from scipy.optimize import curve_fit
# Function for linear fit
def func(x, a, b):
return a + b * x
# Seaborn conveniently provides the data for
# Anscombe's quartet.
df = sns.load_dataset("anscombe")
@kastnerkyle
kastnerkyle / conv_deconv_vae.py
Last active October 19, 2024 08:20
Convolutional Variational Autoencoder, modified from Alec Radford at (https://gist.github.com/Newmu/a56d5446416f5ad2bbac)
# Alec Radford, Indico, Kyle Kastner
# License: MIT
"""
Convolutional VAE in a single file.
Bringing in code from IndicoDataSolutions and Alec Radford (NewMu)
Additionally converted to use default conv2d interface instead of explicit cuDNN
"""
import theano
import theano.tensor as T
from theano.compat.python2x import OrderedDict
@tokestermw
tokestermw / preprocess-twitter.py
Last active January 2, 2023 07:16
Python version of Ruby script to preprocess tweets for use in GloVe featurization http://nlp.stanford.edu/projects/glove/
"""
preprocess-twitter.py
python preprocess-twitter.py "Some random text with #hashtags, @mentions and http://t.co/kdjfkdjf (links). :)"
Script for preprocessing tweets by Romain Paulus
with small modifications by Jeffrey Pennington
with translation to Python by Motoki Wu
Translation of Ruby script to create features for GloVe vectors for Twitter data.
from lasagne.layers import Layer
class HighwayLayer(Layer):
def __init__(self, incoming, layer_class, gate_nonlinearity=None,
**kwargs):
super(HighwayLayer, self).__init__(incoming)
self.H_layer = layer_class(incoming, **kwargs)
if gate_nonlinearity: