Skip to content

Instantly share code, notes, and snippets.

@geneorama
Created February 20, 2013 19:14
Show Gist options
  • Save geneorama/4998308 to your computer and use it in GitHub Desktop.
Save geneorama/4998308 to your computer and use it in GitHub Desktop.
Data.table column conversion question

I want to update a group of columns programmatically. This is simple using data.frame, but in data.table this requires a confusing combination of substitute, as.symbol, and eval.

Am I doing this right?

Simple example:

## GENERATE DATA
library(data.table)
set.seed(1)
dt <- data.table(
    ID1 = c(rep("A", 5), rep("B",5)), 
    ID2 = c(rep("C", 5), rep("D",5)), 
    v1 = as.character(rpois(10, lambda=4)),
    v2 = as.character(rpois(10, lambda=7)),
    v3 = as.character(rpois(10, lambda=99)))
str(dt)

## Predetermined list of Factor and Numeric columns:
FactorColumns = c("ID2", "v1", "v2")
NumericColumns = c("v3")

Conversion with `data.frame`:
df = as.data.frame(dt)
for(col in FactorColumns) df[,col] = factor(df[,col])
for(col in NumericColumns) df[,col] = as.numeric(df[,col])
str(df)

Conversion with `data.table`:
for (col in FactorColumns){
    e = substitute(X := as.factor(X), list(X = as.symbol(col)))
    dt[ , eval(e)]
}
for (col in NumericColumns){
    e = substitute(X := as.numeric(X), list(X = as.symbol(col)))
    dt[ , eval(e)]
}
str(dt)

Both work, but option #2 has an "ouch my head" effect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment