Thom Benjamin Volker
The density ratio estimation problem is to estimate the ratio of two probability density
functions
To do this, we first generate some simple data. We draw
set.seed(12)
n <- 100
nu <- rnorm(n, 1, 1)
de <- rnorm(n, 0, 2)
We now have the following two densities:
curve(dnorm(x, 1, 1), -5, 5, col = "blue", lwd = 2)
curve(dnorm(x, 0, 2), -5, 5, col = "red", lwd = 2, add = TRUE)
legend("topleft", legend = c("nu", "de"), col = c("blue", "red"), lwd = 2)
We define a kernel model for the density ratio as
where
nce <- 100
ce <- seq(min(c(nu, de)), max(c(nu, de)), length.out = nce)
Dnu <- densityratio:::distance(as.matrix(nu), as.matrix(ce), F)
Dde <- densityratio:::distance(as.matrix(de), as.matrix(ce), F)
Knu <- densityratio:::kernel_gaussian(Dnu, 1)
Kde <- densityratio:::kernel_gaussian(Dde, 1)
Accordingly, we have the following true density ratio function:
curve(dnorm(x, 1, 1)/dnorm(x, 0, 2), -5, 5, lwd = 2)
legend("topleft", legend = "True density ratio", lwd = 2)
Note that the density ratio is not itself a density, in fact, the integral does not necessarily converge.
We define the Bregman divergence as
where
is a constant independent of the density ratio model
We can estimate this quantity with the following empirical approximation:
R
, this can be
written as follows:
BR <- function(alpha, Knu, nnu, Kde, nde, lambda = 0, fx, gx) {
rde <- Kde %*% alpha
rnu <- Knu %*% alpha
mean(gx(rde) * rde - fx(rde)) - mean(gx(rnu)) + lambda/2 * sum(alpha^2)
}
If we set ulsif()
function in the densityratio
package, which includes an additional regularization term
Estimating the parameters of this model can be done using the optim()
function in R
. We first define the objective function as follows,
f_pe <- function(x) 1/2 * (x - 1)^2
g_pe <- function(x) x - 1
BR(runif(nce), Knu, n, Kde, n, fx = f_pe, gx = g_pe)
#> [1] 84.67895
We can now estimate the parameters of the model using the optim()
function. We first
estimate the parameters without regularization, and then with regularization.
out <- optim(rep(0, nce), BR, Knu = Knu, nnu = n, Kde = Kde,
nde = n, lambda = 0, fx = f_pe, gx = g_pe,
method = "BFGS")
out_reg <- optim(rep(0, nce), BR, Knu = Knu, nnu = n, Kde = Kde,
nde = n, lambda = 1, fx = f_pe, gx = g_pe,
method = "BFGS",
control = list(maxit = 10000, reltol = 1e-16))
We can now plot the estimated density ratio function and compare it to the true density ratio function.
curve(dnorm(x, 1, 1)/dnorm(x, 0, 2), -5, 5, lwd = 2, ylim = c(0, 4))
points(nu, Knu %*% out$par, col = "blue")
points(nu, Knu %*% out_reg$par, col = "red")
legend("topleft", legend = c("True density ratio", "Estimated density ratio", "Estimated density ratio (regularized)"), lwd = 2, col = c("black", "blue", "red"))
We can also compare the estimated density ratio function to the one estimated by the
ulsif()
function in the densityratio
package.
library(densityratio)
u <- ulsif(nu, de, centers = ce, sigma = 1, lambda = 1, intercept = FALSE)
curve(dnorm(x, 1, 1)/dnorm(x, 0, 2), -5, 5, lwd = 2, ylim = c(0, 4))
points(nu, Knu %*% out$par, col = "blue")
points(nu, Knu %*% out_reg$par, col = "red")
points(nu, predict(u, nu), col = "green")
legend("topleft", legend = c("True density ratio", "Estimated density ratio", "Estimated density ratio (regularized)", "Estimated density ratio (ulsif)"), lwd = 2, col = c("black", "blue", "red", "green"))
And we see that the solution is exactly the same.
all.equal(c(Knu %*% out_reg$par), c(predict(u, nu)))
#> [1] TRUE
Now, if we change the function kliep()
function in the densityratio
-package. Note that we now
have to use a different optimization routine, as negative density ratio values cannot be allowed
(this crashes the function due to the L-BFGS-B
method, which allows
for box constraints.
f_kl <- function(x) x * log(x) - x
g_kl <- function(x) log(x)
BR(runif(nce), Knu, n, Kde, n, fx = f_kl, gx = g_kl)
#> [1] 11.1285
out_KL <- optim(rep(1e-12, nce), BR, Knu = Knu, nnu = n, Kde = Kde,
nde = n, lambda = 0, fx = f_kl, gx = g_kl,
method = "L-BFGS-B",
control = list(maxit = 10000, reltol = 1e-16),
lower = 0)
#> Warning in optim(rep(1e-12, nce), BR, Knu = Knu, nnu = n, Kde = Kde, nde = n, :
#> method L-BFGS-B uses 'factr' (and 'pgtol') instead of 'reltol' and 'abstol'
k <- kliep(nu, de, centers = ce, sigma = 1, cv = TRUE,
maxit = 100000, epsilon = 10^{1:-8})
#> Warning in check.nfold(cv, nfold, sigma, nnu): Only a single 'sigma' value is specified, while cross-validation serves to optimize 'sigma'.
curve(dnorm(x, 1, 1)/dnorm(x, 0, 2), -5, 5, lwd = 2, ylim = c(0, 4))
points(nu, Knu %*% out_KL$par, col = "purple")
points(nu, predict(k, nu), col = "orange")
Now we see that the solutions differ, which is likely due to the fact that the kliep()
function uses a somewhat different optimization routine. That is, kliep()
uses a
coordinate descent algorithm with some additional constraints. Particularly, kliep()
uses a gradient descent algorithm in which negative parameters are set to zero in each
iteration, and the density ratio model is normalized such that
Created on 2024-04-09 with reprex v2.0.2