Thom Benjamin Volker
In this notebook, we explore properties of divergence-based two-sample
testing, using the methods employed in the densityratio package
(Volker et al., 2023). We consider three settings: two groups are drawn
from the same distribution, two groups are drawn from distributions with
different means, but with the same variance-covariance matrix, and two
groups are drawn from distributions with the same means but different
covariances. Subsequently, we compare the performance of the
divergence-based estimators and evaluate type I and type II error rates
under the different scenarios. Additionally, we compare the performance
of the divergence-based estimators with the performance of the
Hotelling’s 
In setting one, we generate two groups of
library(densityratio)
set.seed(123)
nsim <- 500
P <- 5
N <- 200
M1 <- rep(0, P)
M2 <- rep(0.1, P)
V1 <- diag(P)
V3 <- 0.3 + 0.7 * diag(P)
gen_data <- function(N, P, Mnu, Mde, Vnu, Vde) {
  munu <- matrix(rep(Mnu, each = N), N)
  mude <- matrix(rep(Mde, each = N), N)
  nu <- (rnorm(N*P) |> matrix(N)) %*% chol(Vnu) + munu
  de <- (rnorm(N*P) |> matrix(N)) %*% chol(Vde) + mude
  list(nu = nu, de = de)
}Subsequently, we write a function that tests whether the groups of
samples are drawn from the same underlying distribution using different
tests, and report the 
two_sample_test <- function(nu, de) {
  hotelling <- DescTools::HotellingsT2Test(nu, de)$p.value[1]
  kolmogorov <- fasano.franceschini.test::fasano.franceschini.test(nu, de, verbose = FALSE)$p.value
  ulsif_est    <- summary(
    ulsif(nu, de, nsigma = 10, nlambda = 5, ncenters = 100, progressbar = FALSE), 
    test = TRUE,
    progressbar = FALSE)$p_value
  kliep_est    <- summary(
    kliep(nu, de, nsigma = 10, ncenters = 100, progressbar = FALSE), 
    test = TRUE, 
    progressbar = FALSE)$p_value
  kmm_est      <- summary(
    kmm(nu, de, nsigma = 10, ncenters = 100, progressbar = FALSE), 
    test = TRUE, 
    progressbar = FALSE)$p_value
  spectral_est <- summary(
    spectral(nu, de, nsigma = 10, ncenters = 100, progressbar = FALSE), 
    test = TRUE,
    progressbar = FALSE)$p_value
  naive_est    <- summary(naive(nu, de), test = TRUE, progressbar = FALSE)$p_value
  
  c(
    hotelling = hotelling,
    kolmogorov = kolmogorov,
    ulsif = ulsif_est,
    kliep = kliep_est,
    kmm = kmm_est,
    spectral = spectral_est,
    naive = naive_est
  )
}We first generate two groups of samples from the same standard normal
distribution, to evaluate type I error rates of each method. We repeat
this procedure 500 times, and report the proportion of 
cl <- parallel::makeCluster(18)
invisible(parallel::clusterCall(cl, fun = \() {
  library(densityratio)
  pbapply::pboptions(type = "none")
}))
parallel::clusterExport(cl, c("N", "P", "M1", "V1", "gen_data", "two_sample_test"))
out1 <- pbapply::pbreplicate(
  nsim, {
    dat <- gen_data(N, P, M1, M1, V1, V1)
    two_sample_test(dat$nu, dat$de)
  },
  cl = cl
)
parallel::stopCluster(cl)
rowMeans(out1 < 0.05)         hotelling kolmogorov.p-value              ulsif              kliep 
             0.040              0.050              0.052              0.058 
               kmm           spectral              naive 
             0.056              0.052              0.042 
We can observe that, while there is some variation in the type I error
rates, all seem reasonably close to the nominal 
We then shift to the second setting, where we generate two groups of samples from normal distributions with different means.
cl <- parallel::makeCluster(18)
invisible(parallel::clusterCall(cl, fun = \() {
  library(densityratio)
  pbapply::pboptions(type = "none")
}))
parallel::clusterExport(cl, c("N", "P", "M1", "M2", "V1", "gen_data", "two_sample_test"))
out2 <- pbapply::pbreplicate(
  nsim, {
    dat <- gen_data(N, P, M1, M2, V1, V1)
    two_sample_test(dat$nu, dat$de)
  },
  cl = cl
)
parallel::stopCluster(cl)
rowMeans(out2 < 0.05)         hotelling kolmogorov.p-value              ulsif              kliep 
             0.368              0.216              0.238              0.180 
               kmm           spectral              naive 
             0.142              0.148              0.132 
Here, we notice that Hotelling’s 
cl <- parallel::makeCluster(18)
invisible(parallel::clusterCall(cl, fun = \() {
  library(densityratio)
  pbapply::pboptions(type = "none")
}))
parallel::clusterExport(cl, c("N", "P", "M1", "V1", "V3", "gen_data", "two_sample_test"))
out3 <- pbapply::pbreplicate(
  nsim, {
    dat <- gen_data(N, P, M1, M1, V1, V3)
    two_sample_test(dat$nu, dat$de)
  },
  cl = cl
)
parallel::stopCluster(cl)
rowMeans(out3 < 0.05)         hotelling kolmogorov.p-value              ulsif              kliep 
             0.070              0.750              0.550              0.950 
               kmm           spectral              naive 
             1.000              0.955              0.540 
Finally, in the regime where the two groups have different covariance
matrices, we see that the performance of Hotelling’s kliep, kmm and
spectral) really excel, and even outperform the Fasano-Franceschini
test. Hence, the divergence-based tests can be very useful in the case
where two groups are expected to differ in higher-order moments.
Volker, T. B. (2025). Divergence-based testing using density ratio estimation techniques. https://gist.github.com/thomvolker/58197e535ec458752bccbb5b611046ce