Skip to content

Instantly share code, notes, and snippets.

@lwaldron
Last active January 14, 2019 10:39
Show Gist options
  • Save lwaldron/44c7142adb04de03255cc37b0264acf6 to your computer and use it in GitHub Desktop.
Save lwaldron/44c7142adb04de03255cc37b0264acf6 to your computer and use it in GitHub Desktop.
simple DelayedMatrix benchmark showing access time of n rows growing as O(n^3)
if( Biobase::package.version("curatedTCGAData") < "1.5.6" ){
BiocManager::install("waldronlab/curatedTCGAData")
}
stopifnot(BiocManager::version() >= "3.9")
library(curatedTCGAData) #requires >=1.5.6 and bioc-devel
mae <- curatedTCGAData("UCEC", "Methylation_methyl27", dry.run = FALSE) #~2 seconds from cache
dm <- assay(mae, 1)
# benchmarking showing cubic increase with # rows
n <- c(100, 200, 400, 600, 800, 1000, 1250, 1500, 1750, seq(2000, 4000, 500), 5000, 6000, 8000, 10000)
res1 <- sapply(n, function(i) system.time(as.matrix(dm[seq(1, nrow(dm), length.out=i), ]))) #i rows sequentially from 1:all
res2 <- sapply(n, function(i) system.time(as.matrix(dm[sample(1:nrow(dm), i), ]))) #i rows randomly from all
res3 <- sapply(n, function(i) system.time(as.matrix(dm[1:i, ]))) #first i rows
plot(n, res1["elapsed", ], xlab="# of rows selected", ylab="elapsed time for as.matrix",
type="l", ylim=c(0, 15))
lines(n, res2["elapsed", ], lty=2, lwd=2)
lines(n, res3["elapsed", ], lty=3, lwd=2)
legend("topleft", lty=1:3, lwd=c(1, 2, 2),
legend=c("sequential sample", "random sample", "first n rows"))
## try to infer polynomial degree
getr2 <- function(deg, res, n){
summary(lm(res["elapsed", ] ~ I(n^deg)))$r.squared
}
r2 <- data.frame(sequential=sapply(1:4, getr2, res1, n),
random=sapply(1:4, getr2, res2, n),
firstn=sapply(1:4, getr2, res3, n))
r2
@lwaldron
Copy link
Author

Previous version had some ACC crud I forgot to remove

@lwaldron
Copy link
Author

And changed to UCEC because it has only 118 columns (27578 rows)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment