Skip to content

Instantly share code, notes, and snippets.

@huyng
huyng / matplotlibrc
Created February 8, 2011 15:50
my default matplotlib settings
### MATPLOTLIBRC FORMAT
# This is a sample matplotlib configuration file - you can find a copy
# of it on your system in
# site-packages/matplotlib/mpl-data/matplotlibrc. If you edit it
# there, please note that it will be overridden in your next install.
# If you want to keep a permanent local copy that will not be
# over-written, place it in HOME/.matplotlib/matplotlibrc (unix/linux
# like systems) and C:\Documents and Settings\yourname\.matplotlib
# (win32 systems).
@ChrisWills
ChrisWills / .screenrc-main-example
Created November 3, 2011 17:50
A nice default screenrc
# GNU Screen - main configuration file
# All other .screenrc files will source this file to inherit settings.
# Author: Christian Wills - [email protected]
# Allow bold colors - necessary for some reason
attrcolor b ".I"
# Tell screen how to set colors. AB = background, AF=foreground
termcapinfo xterm 'Co#256:AB=\E[48;5;%dm:AF=\E[38;5;%dm'
@alexalemi
alexalemi / welford.py
Created March 21, 2012 19:29
Python Welford Algorithm
import math
class Welford(object):
""" Implements Welford's algorithm for computing a running mean
and standard deviation as described at:
http://www.johndcook.com/standard_deviation.html
can take single values or iterables
Properties:
mean - returns the mean
@zhannes
zhannes / gist:3207394
Created July 30, 2012 14:33
Git rebase workflow
# first, fetch the latest refs for all branches. And be sure we have latest master, etc
git checkout master
git fetch
# If any changes from remote, catch our local version up
git rebase origin/master
# could also be done as
@tonicebrian
tonicebrian / GBT_CaliforniaHousing.py
Created November 5, 2012 16:22
Gradient Boosting Trees using Python
# =============
# Introduction
# =============
# I've been doing some data mining lately and specially looking into `Gradient
# Boosting Trees <http://en.wikipedia.org/wiki/Gradient_boosting>`_ since it is
# claimed that this is one of the techniques with best performance out of the
# box. In order to have a better understanding of the technique I've reproduced
# the example of section *10.14.1 California Housing* in the book `The Elements of Statistical Learning <http://www-stat.stanford.edu/~tibs/ElemStatLearn/>`_.
# Each point of this dataset represents the house value of a property with some
# attributes of that house. You can get the data and the description of those
@mattbaggott
mattbaggott / watercolorplot.R
Created November 6, 2012 07:17
Visually weighted regression / Watercolor plots by Felix Schönbrodt
# Copyright 2012 Felix Schönbrodt
# All rights reserved.
#
# FreeBSD License
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# 1. Redistributions of source code must retain the above copyright
@mattbaggott
mattbaggott / timetoevent.R
Created December 29, 2012 20:36
Example code for time-to-event analysis in R, as in whether repeated ad viewings lead to a sale
##
## Example code for time-to-event analysis in R
## [email protected]
## Dec 28, 2012
##
## joineR package: analyzing longitudinal data where the response
## from each person is a time-sequence of repeated measurements
## and we are interested in a possibly censored time-to-event outcome
##
## example: repeated ad viewings leading to a sale
@mattbaggott
mattbaggott / ggsurvival.R
Last active December 18, 2016 23:20
Functions to make ggplot KM survival / cumulative incidence plot from survfit() models ( library(survival) )
#
# Functions to make ggplot KM survivor curves made with survfit() in library(survival)
#
# code written by Ramon Saccilotto
# and included in his ggplot2 tutorial
# 2010-12-08
# define custom function to create a survival data.frame
createSurvivalFrame <- function(f.survfit){
# initialise frame variable
@mattbaggott
mattbaggott / predicting_customer_behav_1.R
Last active September 15, 2020 22:16
Uses the BTYD package and Pareto/NBD model to predict customer behavior in R Slides are at: http://www.slideshare.net/mattbagg/baggott-predict-customerinrpart1#
#
# PREDICTING LONG TERM CUSTOMER VALUE WITH BTYD PACKAGE
# Pareto/NBD (negative binomial distribution) modeling of
# repeat-buying behavior in a noncontractual setting
#
# Matthew Baggott, [email protected]
#
# Accompanying slides at:
# http://www.slideshare.net/mattbagg/baggott-predict-customerinrpart1#
#
@inkhorn
inkhorn / enron corpus processing.r
Last active December 27, 2015 03:18
Enron Corpus Processing
library(stringr)
library(plyr)
library(tm)
library(tm.plugin.mail)
library(SnowballC)
library(topicmodels)
# At this point, the python script should have been run,
# creating about 126 thousand txt files. I was very much afraid
# to import that many txt files into the tm package in R (my computer only