Skip to content

Instantly share code, notes, and snippets.

@timflutre
Last active May 20, 2024 17:33
Show Gist options
  • Save timflutre/43daacf2c8868f609489 to your computer and use it in GitHub Desktop.
Save timflutre/43daacf2c8868f609489 to your computer and use it in GitHub Desktop.
compare lme4 and rrBLUP to fit an "animal model" (LMM) in R
## Model: a specific kind of linear mixed model known as "animal model" by geneticists
## y = mu 1_N + X b + Z u + e = W a + Z u + e
## y is N x 1; X is N x P; Z is N x Q; W is N x (P+1)
## u ~ Norm_Q(0, sigma_u^2 A); e ~ Norm_N(0, sigma^2 I_N)
## Goal of this document: estimate the variance components sigma_u^2 and sigma^2
## 1) simulate some data
## 2) fit the model above using the package rrBLUP (v4.3, on CRAN)
## 3) fit the model above using lme4 (v1.7, on CRAN)
## Author: Timothée Flutre (INRA)
## ----pkg-----------------------------------------------------------------
seed <- 1859; set.seed(seed)
suppressPackageStartupMessages(library(MASS))
suppressPackageStartupMessages(library(Matrix))
suppressPackageStartupMessages(library(rrBLUP))
suppressPackageStartupMessages(library(lme4))
## ----simul_animals-------------------------------------------------------
N <- 1000 # chosen so that one has enough power
animal.ids <- sprintf(fmt=paste0("ind%0", floor(log10(N))+1, "i"), 1:N)
head(animal.ids)
## ----simul_global_mean---------------------------------------------------
mu <- 4 # arbitrary
## ----simul_other_fixed_effects-------------------------------------------
P <- 1
X <- matrix(data=rnorm(n=N), nrow=N, ncol=P)
b <- matrix(data=c(2), nrow=P, ncol=1) # arbitrary
## ----simul_all_fixed_effects---------------------------------------------
W <- cbind(rep(1, N), X)
a <- matrix(c(mu, b))
head(W)
head(a)
## ----simul_A-------------------------------------------------------------
Q <- N
nb.factors <- 5
nb.snps <- 1000
snp.ids <- sprintf(fmt=paste0("snp%0", floor(log10(nb.snps))+1, "i"),
1:nb.snps)
F <- matrix(data=rnorm(n=Q*nb.factors), nrow=Q, ncol=nb.factors)
M <- mvrnorm(n=nb.snps, mu=rep(1, Q), Sigma=F %*% t(F) + diag(Q))
M[M < 0.2] <- 0
M[M >= 0.2 & M <= 1.8] <- 1
M[M > 1.8] <- 2
A <- (1/nb.snps) * t(M) %*% M
print(A[1:5,1:5])
kappa(A)
image(A[,nrow(A):1], axes=FALSE)
## ----simul_random_effects------------------------------------------------
Z <- diag(Q)
sigmau2 <- 5 # arbitrary
G <- as.matrix(nearPD(sigmau2 * A)$mat)
u <- matrix(mvrnorm(n=1, mu=rep(0, Q), Sigma=G))
summary(u)
## ----simul_errors--------------------------------------------------------
lambda <- 3 # interesting to change it and see how inference goes
sigma2 <- sigmau2 / lambda
R <- sigma2 * diag(N)
e <- matrix(mvrnorm(n=1, mu=rep(0, N), Sigma=R))
## ----h2------------------------------------------------------------------
(h2 <- sigmau2 / (sigmau2 + sigma2))
## ----simul_phenotypes----------------------------------------------------
y <- W %*% a + Z %*% u + e
## ----simul_reformat------------------------------------------------------
dat <- data.frame(fix=W[,2],
animal=factor(animal.ids),
response=y[,1])
str(dat)
summary(dat)
## ----fit_rrBLUP----------------------------------------------------------
system.time(fit.rrBLUP <- mixed.solve(y=y,
Z=Z,
K=A,
X=W,
method="REML",
SE=TRUE,
return.Hinv=TRUE))
## time: user=3.209 sys=0.004 elapsed=3.214
fit.rrBLUP$Vu # 5.008
sigmau2 # 5
fit.rrBLUP$Ve # 1.665
sigma2 # 1.667
## very good!
## ----fit_lme4------------------------------------------------------------
formula <- response ~ 1 + fix + (1|animal)
parsedFormula <- lFormula(formula=formula, data=dat,
control=lmerControl(check.nobs.vs.nlev="ignore",
check.nobs.vs.nRE="ignore"))
relmat <- list(animal=A)
relfac <- relmat
flist <- parsedFormula$reTrms[["flist"]] # list of grouping factors
Ztlist <- parsedFormula$reTrms[["Ztlist"]] # list of transpose of the sparse model matrices
stopifnot(all(names(relmat) %in% names(flist)))
asgn <- attr(flist, "assign")
for(i in seq_along(relmat)) {
tn <- which(match(names(relmat)[i], names(flist)) == asgn)
if(length(tn) > 1)
stop("a relationship matrix must be associated",
" with only one random effects term", call.=FALSE)
relmat[[i]] <- Matrix(relmat[[i]], sparse=TRUE)
relfac[[i]] <- chol(relmat[[i]])
Ztlist[[i]] <- relfac[[i]] %*% Ztlist[[i]]
}
parsedFormula$reTrms[["Ztlist"]] <- Ztlist
parsedFormula$reTrms[["Zt"]] <- do.call(rBind, Ztlist)
devianceFunction <- do.call(mkLmerDevfun, parsedFormula)
optimizerOutput <- optimizeLmer(devianceFunction)
fit.lme4 <- mkMerMod(rho=environment(devianceFunction),
opt=optimizerOutput,
reTrms=parsedFormula$reTrms,
fr=parsedFormula$fr)
summary(fit.lme4)
vc <- as.data.frame(VarCorr(fit.lme4))
(sigma2.hat <- vc[vc$grp == "Residual", "vcov"]) # 1.666
sigma2 # 1.667
(sigmau2.hat <- vc[vc$grp == "animal", "vcov"]) # 5.008
sigmau2 # 5
## very good, same as rrBLUP
@wviechtb
Copy link

This is great! Thanks for posting this. One small comment: Both estimates for the 'animal' variance component are actually 4.807, so you may want to update those comments.

And I was intrigued by the problem, so I tried out fitting the same model with metafor -- yes, this isn't a meta-analysis, but the mechanics underneath the rma.mv() function allow you to trick the function into fitting the same model. Here is the code:

library(metafor)
dat$residual <- dat$animal
rownames(A) <- colnames(A) <- dat$animal
res <- rma.mv(response ~ fix, V=0, random = list(~ 1 | animal, ~ 1 | residual), R = list(animal = A), Rscale="none", data=dat, verbose=TRUE, control=list(sigma2.init=c(1,4)))
res

For the most part, this syntax should make some sense if you are familiar with lme(), the only peculiar things being V=0 (in meta-analyses, we typically have a known var-cov matrix for the observed outcomes, but not here, so we just set that part to 0) and the R and Rscale parts. Here we define that the matrix A is the known relationship matrix for the animal random effects and Rscale="none" is used so the A matrix does not get converted into a correlation matrix (which would be the default behavior of the function).

Note also that I am cheating a bit by setting the starting values for the two variance components closer to their final values, but the default starting values are off quite a bit and model fitting then takes even longer (improving the way the starting values are chosen is on my to-do list). Still, it's rather slow. However, the results are:

Multivariate Meta-Analysis Model (k = 1000; method: REML)

Variance Components: 

            estim    sqrt  nlvls  fixed    factor    R
sigma^2.1  4.8071  2.1925   1000     no    animal  yes
sigma^2.2  1.7090  1.3073   1000     no  residual   no

Test of Moderators (coefficient(s) 2): 
QM(df = 1) = 1614.6799, p-val < .0001

Model Results:

         estimate      se     zval    pval    ci.lb   ci.ub     
intrcpt    2.9182  1.8096   1.6127  0.1068  -0.6285  6.4650     
fix        1.9513  0.0486  40.1831  <.0001   1.8561  2.0464  ***


---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So things match up nicely as well here. This was more of an exercise for me (I wanted to check that rma.mv() works as expected), but I figured I'll post this here in case you find this useful.

@timflutre
Copy link
Author

You're welcome. However, I don't understand why you say "the 'animal' variance component is actually 4.807". This is not the case for me on my machine with R 3.2.1. Did you set the seed upfront as indicated in the script?

@wviechtb
Copy link

Yes, I did. And I just re-ran this (also on R 3.2.1) and again get:

> fit.rrBLUP$Vu            # 5.008
[1] 4.807161

and

> (sigmau2.hat <- vc[vc$grp == "animal", "vcov"])   # 5.008
[1] 4.807133

and again the same results with rma.mv().

@timflutre
Copy link
Author

Ok, thanks for checking, but I don't have any explanation for that. For a given run on a given machine, as long as all the tools agree, that's the most important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment