Skip to content

Instantly share code, notes, and snippets.

@shuckle16
Created April 11, 2021 00:17
Show Gist options
  • Save shuckle16/faefbc2e072630c0f0d182c9163cb91c to your computer and use it in GitHub Desktop.
Save shuckle16/faefbc2e072630c0f0d182c9163cb91c to your computer and use it in GitHub Desktop.
glmnet simulation -- compare using pca on predictors vs not
library(doMC)
library(glmnet)
library(tictoc)
library(tidyr)
library(ggplot2)
library(PCAtools)
y <- rgamma(10000, shape = 2, scale = 2000)
x <- y - matrix(rnorm(10000000, mean = 3000, sd = 100000), nrow = 10000)
options(future.globals.maxSize = 891289600 * 4)
registerDoMC(cores = 5)
tic()
mod <- cv.glmnet(x, y, parallel = T, nfolds = 5, trace.it = T)
toc()
preds <- predict(mod, newx = x)[,1]
x_pca <- pca(x)
mod_pcomp <- cv.glmnet(data.matrix(x_pca$loadings)[,1:30], y, parallel = T, nfolds = 5, trace.it = T)
preds_pcomp <- predict(mod_pcomp, newx = data.matrix(x_pca$loadings[,1:30]))[,1]
data.frame(preds, preds_pcomp, y) %>%
gather(key = metric, value = value) %>%
ggplot(aes(x = value, fill = metric)) +
geom_density(alpha = 0.3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment