Skip to content

Instantly share code, notes, and snippets.

@HarlanH
HarlanH / df.by.rows
Created March 2, 2010 20:01
a simplified version of aaply() from the R plyr package, much faster for simple cases
df.by.rows <- function(df, fn, agg.type=numeric, .progress=NULL)
{
# A simplified and more-efficient version of aaply for data frames. Iterates over rows of the df, applies the
# fn to each single-row slice, and aggregates the **scalar** results.
#
# Args:
# df - data frame to split
# fn - function to apply to each row
# agg.type - a constructor, one of numeric, character
# .progress - set to "text" to get a progress bar
@HarlanH
HarlanH / ggplotanimations.R
Created March 11, 2010 18:58
demonstration of animating R/ggplot2 graphs with ImageMagick
# NOTE: requires ImageMagick to be installed and in the PATH! Should work on most Linux and Mac systems.
library(ggplot2)
d <- data.frame(x=rep(1:10,10), y=rnorm(100, 1:10))
files <- character()
for (i in 1:10)
{
@HarlanH
HarlanH / monotonic_smoothing.R
Created June 2, 2010 20:53
demonstration of monotonic smoothing with 2-D integral IVs
# demonstration of monotonic smoothing with 2-D integral IVs
# Harlan Harris
# [email protected]
library(ggplot2)
library(locfit)
library(monoProc)
df <-
structure(list(counts = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
@HarlanH
HarlanH / Notes.txt
Created June 3, 2010 02:05
Code for the NYC R Meetup on debugging R
If you see the bug, keep your damn mouth shut!
Example #1
Show data.
I've written pointless code.
traceback() shows where the problem lies, but not when or why
Explain "call stack" made of "frames", noting that many things that don't
syntactically look like function calls in R actually are.
Error doesn't seem to happen all the time, so adding browser() calls
might not be very helpful.
# solution via Hadley Wickham and others at:
# http://www.mail-archive.com/[email protected]/msg85203.html
qq <- list(data.frame(c1=c('a', 'b', 'c')), data.frame(c1=c('a','d','e')), data.frame(c1=c('a','f','g')))
fold <- function(x, fun) {
if (length(x) == 1) return(fun(x))
accumulator <- fun(x[[1]], x[[2]])
if (length(x) == 2) return(accumulator)
@HarlanH
HarlanH / visualizing-categorizations.R
Created April 22, 2011 22:23
Visualizing Categorizations blog post code
# Demo of techniques to visualize the predictions made by a categorization model.
library(ROCR)
library(ggplot2)
load(url('http://dl.dropbox.com/u/7644953/classifier-visualization.Rdata'))
pred.df$actual.bin <- ifelse(pred.df$actual == 'yes', 1, 0)
pred.df <- pred.df[order(pred.df$predicted, decreasing=TRUE), ]
@HarlanH
HarlanH / DSDC-Titles.R
Created September 26, 2011 12:44
Data Science DC Titles Visualization
# Data Science DC Titles Visualization
# Here's how this will work. In a main loop, a parameterized visualization function
# is called every N seconds. Each function gets the source spreadsheet fresh, and
# generates a visual.
# aspects of this code borrowed from Drew Conway:
# https://raw.github.com/drewconway/ZIA/master/R/better_word_cloud/better_word_cloud.R
library(plyr)
@HarlanH
HarlanH / ISBNdb.R
Created January 27, 2012 14:39
R wrapper around the ISBNdb web service
library(XML)
library(RCurl)
# USAGE: ldply(c('9780387962406', '9780387961406', '0387981403'), function(x) ISBNdb(x, 'apikey'))
ISBNdb <- function(isbn, access_key,
isbn.api='http://isbndb.com/api/books.xml?access_key=%s&index1=isbn&value1=%s') {
isbn <- as.character(isbn)
@HarlanH
HarlanH / NA.j
Created February 25, 2012 18:57
playing with NA for Julia
type _NA
end
macro NA() # for easier typing!
_NA()
end
NumberData = Union(_NA, Number)
StringData = Union(_NA, String)
@HarlanH
HarlanH / vswitch.R
Created June 6, 2012 14:28
vswitch
vswitch <- function(namedList, default=NA, selector) {
# Function adapted from Bill Dunlap that implements something along the lines of a vectorized switch statement.
# http://tolstoy.newcastle.edu.au/R/e8/devel/09/12/1122.html
#
# Args:
# namedList - e.g., list(times=df$a * df$b, plus=df$a + df$b)
# default - a value to assign to elements of selector that aren't matched in namedList
# selector - e.g., c('times', 'times', 'plus', 'exp', 'plus')
#
# Returns: a vector of values selected from namedList