Skip to content

Instantly share code, notes, and snippets.

@thomvolker
Last active May 3, 2024 12:22
Show Gist options
  • Save thomvolker/8cc5527dc419b7a556c649ffe7297d59 to your computer and use it in GitHub Desktop.
Save thomvolker/8cc5527dc419b7a556c649ffe7297d59 to your computer and use it in GitHub Desktop.
Finding ellipsoidal probabilities under the multivariate normal model

Finding ellipsoidal probabilities under the multivariate normal model

Thom Benjamin Volker

Introduction

Many statistical models impose multivariate normality, for example, for example on the regression parameters in a linear regression model. A question that may arise is how to find the probability that a random vector falls into some convex region $\mathcal{A}$ under this model. That is, we seek to find $$P(\mathbf{x}\in \mathcal{A}) = \int_{\mathcal{A}} f(\mathbf{x}) d\mathbf{x},$$ where $f(\mathbf{x})$ is the density of the multivariate normal distribution. That is, $$f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^p |\mathbf{\Sigma}|}}\exp \Bigg\{-\frac{1}{2} (\mathbf{x}- \mathbf{\mu})^\top \mathbf{\Sigma}^{-1} (\mathbf{x}- \mathbf{\mu}) \Bigg\},$$ where $p$ denotes the number of dimensions, $\mathbf{\mu}$ is the mean vector, and $\mathbf{\Sigma}$ is the covariance matrix, and $|\cdot|$ denotes the determinant.

Solving $P(x \in \mathcal{A})$ when $\mathcal{A}$ is a hyperrectangle

If the region $\mathcal{A}$ is rectangular, the problem is easily solved using of the shelf tools, such as the pmvnorm function in the mvtnorm package in R. In such circumstances, we seek to find the solution of the integral $$P(\mathbf{x}\in \mathcal{A}) = \frac{1}{\sqrt{(2\pi)^p |\mathbf{\Sigma}|}} \int_{a_1}^{b_1} \int_{a_2}^{b_2} \cdots \int_{a_p}^{b_p} \exp \Bigg\{ -\frac{1}{2} (\mathbf{x}- \mathbf{\mu})^\top \mathbf{\Sigma}^{-1} (\mathbf{x}- \mathbf{\mu}) \Bigg\} d\mathbf{x},$$ where $a_i$ and $b_i$ are the lower and upper bounds of the $i$th dimension of the region $\mathcal{A}$. Genz (1992) shows that this integral can be transformed into the iterated integral $$P(\mathbf{x}\in \mathcal{A}) = (e_1 - d_1) \int_0^1 (e_2 - d_2) \dots \int_0^1 (e_m - d_m) \int_0^1 d \mathbf{w},$$ with $$d_i = \mathbf{\Phi}((a_i - \sum^{i-1}_j c_{i,j} \mathbf{\Phi}^{-1}(d_j + w_j(e_j-d_j)))/c_{i,i})$$ and $$e_i = \mathbf{\Phi}((b_i - \sum^{i-1}_j c_{i,j} \mathbf{\Phi}^{-1}(d_j + w_j(e_j-d_j)))/c_{i,i}).$$ That is, we transform the problem into a series of univariate integrals that can be easily solved using standard univariate integration tools as, for example, pnorm() in R.

The following code implements the approach in R. Note, however, that the current implementations use some additional tricks to increase the accuracy of the program (or reduce the computation time), by changing the order of the variables.

set.seed(123)

# Parameters
P <- 5
rho <- 0.5
S <- rho + (1-rho) * diag(P)
mu <- 1:P

# Integration limits
a <- sample(-(1:P)) - mu
b <- 3 * sample(1:P) - mu

# Test results
mvtnorm::pmvnorm(lower = a, upper = b, mean = 0, sigma = S)
[1] 0.8409671
attr(,"error")
[1] 1.364544e-05
attr(,"msg")
[1] "Normal Completion"
# Convergence limits
maxerr <- 1e-4
Nmax <- 10000

# Cholesky decomposition
C <- t(chol(S))

# Initialize variables
intsum <- 0
varsum <- 0
N      <- 1
d1 <- pnorm(a[1]/C[1,1])
e1 <- pnorm(b[1]/C[1,1])
f1 <- e1 - d1

# Loop until convergence
conv <- FALSE
out <- numeric(Nmax)
while(!conv) {
  
  w <- runif(P-1)
  di <- d1
  ei <- e1
  fi <- ei - di
  for (i in 2:P) {
    if (i == 2) yi <- 0
    yi[i-1] <- qnorm(di + w[i-1] * (ei - di))
    di <- pnorm((a[i] - sum(C[i, 1:(i-1)]*yi))/C[i,i])
    ei <- pnorm((b[i] - sum(C[i, 1:(i-1)]*yi))/C[i,i])
    fi <- (ei - di) * fi
  }
  intsum <- intsum + fi
  varsum <- varsum + fi^2
  N <- N+1
  err <- 1 * ((varsum/N - (intsum/N)^2)/(N))^.5
  out[N-1] <- intsum/N
  if (N == Nmax | err < maxerr) conv <- TRUE
}
N
[1] 10000
err
[1] 0.001287477
intsum/N
[1] 0.8424054
plot(out[out != 0], type = "l", lwd = 2, col = "blue")
abline(h = mvtnorm::pmvnorm(lower = a, upper = b, mean = 0, sigma = S), lty=2)

However, if the integration is not over a hyperrectangle, the current approach cannot be used, and we need to resolve to other techniques.

Solving $P(x \in \mathcal{A})$ when $\mathcal{A}$ is a hyperellipse

TO DO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment