Thom Benjamin Volker
The density ratio estimation problem is to estimate the ratio of two probability density
functions
To do this, we first generate some simple data. We draw
set.seed(12)
n <- 100
nu <- rnorm(n, 1, 1)
de <- rnorm(n, 0, 2)We now have the following two densities:
curve(dnorm(x, 1, 1), -5, 5, col = "blue", lwd = 2)
curve(dnorm(x, 0, 2), -5, 5, col = "red", lwd = 2, add = TRUE)
legend("topleft", legend = c("nu", "de"), col = c("blue", "red"), lwd = 2)We define a kernel model for the density ratio as
where
nce <- 100
ce <- seq(min(c(nu, de)), max(c(nu, de)), length.out = nce)
Dnu <- densityratio:::distance(as.matrix(nu), as.matrix(ce), F)
Dde <- densityratio:::distance(as.matrix(de), as.matrix(ce), F)
Knu <- densityratio:::kernel_gaussian(Dnu, 1)
Kde <- densityratio:::kernel_gaussian(Dde, 1)Accordingly, we have the following true density ratio function:
curve(dnorm(x, 1, 1)/dnorm(x, 0, 2), -5, 5, lwd = 2)
legend("topleft", legend = "True density ratio", lwd = 2)Note that the density ratio is not itself a density, in fact, the integral does not necessarily converge.
We define the Bregman divergence as
where
is a constant independent of the density ratio model
We can estimate this quantity with the following empirical approximation:
R, this can be
written as follows:
BR <- function(alpha, Knu, nnu, Kde, nde, lambda = 0, fx, gx) {
rde <- Kde %*% alpha
rnu <- Knu %*% alpha
mean(gx(rde) * rde - fx(rde)) - mean(gx(rnu)) + lambda/2 * sum(alpha^2)
}If we set ulsif() function in the densityratio package, which includes an additional regularization term
Estimating the parameters of this model can be done using the optim() function in R. We first define the objective function as follows,
f_pe <- function(x) 1/2 * (x - 1)^2
g_pe <- function(x) x - 1
BR(runif(nce), Knu, n, Kde, n, fx = f_pe, gx = g_pe)
#> [1] 84.67895We can now estimate the parameters of the model using the optim() function. We first
estimate the parameters without regularization, and then with regularization.
out <- optim(rep(0, nce), BR, Knu = Knu, nnu = n, Kde = Kde,
nde = n, lambda = 0, fx = f_pe, gx = g_pe,
method = "BFGS")
out_reg <- optim(rep(0, nce), BR, Knu = Knu, nnu = n, Kde = Kde,
nde = n, lambda = 1, fx = f_pe, gx = g_pe,
method = "BFGS",
control = list(maxit = 10000, reltol = 1e-16))We can now plot the estimated density ratio function and compare it to the true density ratio function.
curve(dnorm(x, 1, 1)/dnorm(x, 0, 2), -5, 5, lwd = 2, ylim = c(0, 4))
points(nu, Knu %*% out$par, col = "blue")
points(nu, Knu %*% out_reg$par, col = "red")
legend("topleft", legend = c("True density ratio", "Estimated density ratio", "Estimated density ratio (regularized)"), lwd = 2, col = c("black", "blue", "red"))We can also compare the estimated density ratio function to the one estimated by the
ulsif() function in the densityratio package.
library(densityratio)
u <- ulsif(nu, de, centers = ce, sigma = 1, lambda = 1, intercept = FALSE)
curve(dnorm(x, 1, 1)/dnorm(x, 0, 2), -5, 5, lwd = 2, ylim = c(0, 4))
points(nu, Knu %*% out$par, col = "blue")
points(nu, Knu %*% out_reg$par, col = "red")
points(nu, predict(u, nu), col = "green")
legend("topleft", legend = c("True density ratio", "Estimated density ratio", "Estimated density ratio (regularized)", "Estimated density ratio (ulsif)"), lwd = 2, col = c("black", "blue", "red", "green"))And we see that the solution is exactly the same.
all.equal(c(Knu %*% out_reg$par), c(predict(u, nu)))
#> [1] TRUENow, if we change the function kliep() function in the densityratio-package. Note that we now
have to use a different optimization routine, as negative density ratio values cannot be allowed
(this crashes the function due to the L-BFGS-B method, which allows
for box constraints.
f_kl <- function(x) x * log(x) - x
g_kl <- function(x) log(x)
BR(runif(nce), Knu, n, Kde, n, fx = f_kl, gx = g_kl)
#> [1] 11.1285out_KL <- optim(rep(1e-12, nce), BR, Knu = Knu, nnu = n, Kde = Kde,
nde = n, lambda = 0, fx = f_kl, gx = g_kl,
method = "L-BFGS-B",
control = list(maxit = 10000, reltol = 1e-16),
lower = 0)
#> Warning in optim(rep(1e-12, nce), BR, Knu = Knu, nnu = n, Kde = Kde, nde = n, :
#> method L-BFGS-B uses 'factr' (and 'pgtol') instead of 'reltol' and 'abstol'
k <- kliep(nu, de, centers = ce, sigma = 1, cv = TRUE,
maxit = 100000, epsilon = 10^{1:-8})
#> Warning in check.nfold(cv, nfold, sigma, nnu): Only a single 'sigma' value is specified, while cross-validation serves to optimize 'sigma'.curve(dnorm(x, 1, 1)/dnorm(x, 0, 2), -5, 5, lwd = 2, ylim = c(0, 4))
points(nu, Knu %*% out_KL$par, col = "purple")
points(nu, predict(k, nu), col = "orange")Now we see that the solutions differ, which is likely due to the fact that the kliep()
function uses a somewhat different optimization routine. That is, kliep() uses a
coordinate descent algorithm with some additional constraints. Particularly, kliep()
uses a gradient descent algorithm in which negative parameters are set to zero in each
iteration, and the density ratio model is normalized such that
Created on 2024-04-09 with reprex v2.0.2





A nice aspect is that we can do exactly the same thing with optimized libraries, like
torch. For an example, see below:Created on 2025-10-12 with reprex v2.1.1