-
-
Save mikmart/85a5118285b1f8e4361399a1a3cb3034 to your computer and use it in GitHub Desktop.
set.seed(1) | |
n <- 1000 | |
matrices <- replicate(n, matrix(runif(24 ^ 2), nrow = 24)) | |
str(matrices) | |
#> num [1:24, 1:24, 1:1000] 0.266 0.372 0.573 0.908 0.202 ... | |
f <- function(x) { | |
sum_x <- sum(x) | |
if (sum_x == 0) | |
return(0) | |
p <- x / sum_x | |
p[p == 0] <- 1 | |
-sum(p * log2(p)) | |
} | |
g <- function(x, i) { | |
apply(x, i, sum) / sum(x) * 2 ^ apply(x, i, f) | |
} | |
result <- apply(matrices, 3, g, 2) | |
str(result) | |
#> num [1:24, 1:1000] 0.926 0.867 0.849 0.89 0.853 ... | |
#' Created on 2018-03-06 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0). |
I see... wow. Thanks! God exists!
I have applied it to real data and get some NaNs. I have track back the error and It was due to some columns with sum = 0. By applying runif you never get a 0 value, but our data are randomly generated by a Poisson distribution (to test aganist some null models) and sometimes get colSums = 0. These NaNs make final calculations to populate with NaNs. I have solved it with:
result <- apply(matrices, 3, g, 2)
result[is.na(result)] <- 0
And everything is alright.... and:
Previously to this late modification of the function to deal with p == 0, I tried just to replace in the raw data, every 0 with 1e-30. Comparing the results, they are exactly the same. But this way is mathematically correct.
Many thanks again, wow I'm really impressed. I have learned a lot.
JL (shasha as stackoverflow user)
Ah, yeah I see. Like you've done, just replacing the NA
s in result
with zeroes will produce the correct result; but still, it's probably cleaner to just deal with that in the f
function, too, so that we never divide by zero. If we add a check to see if sum(x) == 0
in f
and return 0
if so, we get the same result -- and don't have to do any clean up after!
I'm happy to help! It's been interesting refreshing my array manipulations, since I normally just work with data frames and lists.
Based on discussion in this StackOverflow question.