I want to update a group of columns programmatically. This is simple using data.frame, but in data.table this requires a confusing combination of substitute
, as.symbol
, and eval
.
Am I doing this right?
Simple example:
## GENERATE DATA
library(data.table)
set.seed(1)
dt <- data.table(
ID1 = c(rep("A", 5), rep("B",5)),
ID2 = c(rep("C", 5), rep("D",5)),
v1 = as.character(rpois(10, lambda=4)),
v2 = as.character(rpois(10, lambda=7)),
v3 = as.character(rpois(10, lambda=99)))
str(dt)
## Predetermined list of Factor and Numeric columns:
FactorColumns = c("ID2", "v1", "v2")
NumericColumns = c("v3")
Conversion with `data.frame`:
df = as.data.frame(dt)
for(col in FactorColumns) df[,col] = factor(df[,col])
for(col in NumericColumns) df[,col] = as.numeric(df[,col])
str(df)
Conversion with `data.table`:
for (col in FactorColumns){
e = substitute(X := as.factor(X), list(X = as.symbol(col)))
dt[ , eval(e)]
}
for (col in NumericColumns){
e = substitute(X := as.numeric(X), list(X = as.symbol(col)))
dt[ , eval(e)]
}
str(dt)
Both work, but option #2 has an "ouch my head" effect.