Last active
April 5, 2023 02:32
-
-
Save bobthecat/5024079 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bigcorPar <- function(x, nblocks = 10, verbose = TRUE, ncore="all", ...){ | |
library(ff, quietly = TRUE) | |
require(doMC) | |
if(ncore=="all"){ | |
ncore = multicore:::detectCores() | |
registerDoMC(cores = ncore) | |
} else{ | |
registerDoMC(cores = ncore) | |
} | |
NCOL <- ncol(x) | |
## test if ncol(x) %% nblocks gives remainder 0 | |
if (NCOL %% nblocks != 0){stop("Choose different 'nblocks' so that ncol(x) %% nblocks = 0!")} | |
## preallocate square matrix of dimension | |
## ncol(x) in 'ff' single format | |
corMAT <- ff(vmode = "single", dim = c(NCOL, NCOL)) | |
## split column numbers into 'nblocks' groups | |
SPLIT <- split(1:NCOL, rep(1:nblocks, each = NCOL/nblocks)) | |
## create all unique combinations of blocks | |
COMBS <- expand.grid(1:length(SPLIT), 1:length(SPLIT)) | |
COMBS <- t(apply(COMBS, 1, sort)) | |
COMBS <- unique(COMBS) | |
## iterate through each block combination, calculate correlation matrix | |
## between blocks and store them in the preallocated matrix on both | |
## symmetric sides of the diagonal | |
results <- foreach(i = 1:nrow(COMBS)) %dopar% { | |
COMB <- COMBS[i, ] | |
G1 <- SPLIT[[COMB[1]]] | |
G2 <- SPLIT[[COMB[2]]] | |
if (verbose) cat("Block", COMB[1], "with Block", COMB[2], "\n") | |
flush.console() | |
COR <- cor(x[, G1], x[, G2], ...) | |
corMAT[G1, G2] <- COR | |
corMAT[G2, G1] <- t(COR) | |
COR <- NULL | |
} | |
gc() | |
return(corMAT) | |
} |
Thanks Bob for the code. The only problem is the size limitation of ff matrix is about 45,000. In addition, converting ff matrix to ffdf and then writing to file takes a long time. I made a fork to modify the code to handle ~120,000 columns (unlimited number in theory) and print flatten correlation to a file. I run the script for 24 chromosomes separately and it took about 15 hours to complete using 128 CPUs and 4Gb of memory.
Hi! can you share your code? It would be very useful, thanks in advance!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks Bob for the code. The only problem is the size limitation of ff matrix is about 45,000. In addition, converting ff matrix to ffdf and then writing to file takes a long time. I made a fork to modify the code to handle ~120,000 columns (unlimited number in theory) and print flatten correlation to a file. I run the script for 24 chromosomes separately and it took about 15 hours to complete using 128 CPUs and 4Gb of memory.