Created
April 18, 2014 00:03
-
-
Save mmparker/11018133 to your computer and use it in GitHub Desktop.
Ways to use a non-vectorized function on every row of a data.frame
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Sample data | |
X <- data.frame( | |
x = c(1, 2, 3), | |
y = c(4, 5, 6), | |
etc = c("a", "b", "c") | |
) | |
# Arbitrary stand-in for function that can't be vectorized (no pmax) | |
max.fun <- function(a, b) { max(c(a, b)) } | |
# Using dplyr | |
# First, tell dplyr how to group rows | |
# I'm using x, y here, but whatever uniquely identifies rows would work | |
library(dplyr) | |
Y <- group_by(X, x, y) | |
# This seems to do what you need without dropping any columns | |
Y <- mutate(Y, result = max.fun(x, y)) | |
# Another option with data.table - for a long time, if you wanted | |
# the fastest possible ops, data.table was it. Not sure if dplyr | |
# has surpassed it, but for 5 million rows, it's worth a shot. | |
# Key advantage: no copying of objects in memory. | |
library(data.table) | |
Z <- as.data.table(X) | |
# No need to assign - this is the data.table magic | |
Z[, result := max.fun(x, y), by = list(x, y)] | |
# Check it | |
Z |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment