Where should the Gaussian centers come from in density ratio estimation?

Sugiyama, Suzuki & Kanamori (2012) advise to use (a subset from) the numerator samples in density ratio estimation. However, this advise might very much depend on the use-case. Namely, the density ratio can be estimated accurately in some region only if there are centers located in that region. In the figures below, we have two datasets sampled from two functions with different support. If the centers are chosen from the population with smallest support, the regularization seems to have a stronger effect.

set.seed(1)

x1 <- runif(1000, 0.5, 1.5)
x2 <- runif(1000, 0, 2)

fit1 <- densityratio::ulsif(x1, x2, centers = x1, nlambda = 10, parallel = TRUE, nthreads = 18)
fit2 <- densityratio::ulsif(x2, x1, centers = x1, nlambda = 10, parallel = TRUE, nthreads = 18)

fit1
#> 
#> Call:
#> densityratio::ulsif(df_numerator = x1, df_denominator = x2, nlambda = 10,     centers = x1, parallel = TRUE, nthreads = 18)
#> 
#> Kernel Information:
#>   Kernel type: Gaussian with L2 norm distances
#>   Number of kernels: 1000
#>   sigma: num [1:10] 0.0252 0.078 0.1343 0.1948 0.2588 ...
#>   lambda: num [1:10] 1000 215.44 46.42 10 2.15 ...
#> 
#> Optimal sigma: 0.02521476
#> Optimal lambda: 0.1
#> Optimal kernel weights (loocv): num [1:1001] 0.07441 0.02149 0.02503 0.03088 0.00648 ...
#> 
fit2
#> 
#> Call:
#> densityratio::ulsif(df_numerator = x2, df_denominator = x1, nlambda = 10,     centers = x1, parallel = TRUE, nthreads = 18)
#> 
#> Kernel Information:
#>   Kernel type: Gaussian with L2 norm distances
#>   Number of kernels: 1000
#>   sigma: num [1:10] 0.0535 0.1588 0.2625 0.3659 0.4676 ...
#>   lambda: num [1:10] 1000 215.44 46.42 10 2.15 ...
#> 
#> Optimal sigma: 0.05350963
#> Optimal lambda: 0.4641589
#> Optimal kernel weights (loocv): num [1:1001] 1.04422 -0.00134 -0.0057 -0.00423 -0.00718 ...
#> 

fit3 <- densityratio::ulsif(x1, x2, centers = x2, nlambda = 10, parallel = TRUE, nthreads = 18)
fit4 <- densityratio::ulsif(x2, x1, centers = x2, nlambda = 10, parallel = TRUE, nthreads = 18)

plot(x2, predict(fit1, x2) |> log(), ylim = c(-8, 8))
points(x2, predict(fit2, x2) |> log(), col = "blue")

Here, the density ratio is shrunken towards $1$ in the region with no centers when $x2$ is chosen to be the numerator sample, and towards approximately $1/12$ (-2.5 on the log-scale) when $x2$ is chosen to be in the denominator.

fit3
#> 
#> Call:
#> densityratio::ulsif(df_numerator = x1, df_denominator = x2, nlambda = 10,     centers = x2, parallel = TRUE, nthreads = 18)
#> 
#> Kernel Information:
#>   Kernel type: Gaussian with L2 norm distances
#>   Number of kernels: 1000
#>   sigma: num [1:10] 0.0535 0.1588 0.2625 0.3659 0.4676 ...
#>   lambda: num [1:10] 1000 215.44 46.42 10 2.15 ...
#> 
#> Optimal sigma: 0.05350963
#> Optimal lambda: 0.4641589
#> Optimal kernel weights (loocv): num [1:1001] 0.19758 0.01927 0.01444 0.01008 -0.00281 ...
#> 
fit4
#> 
#> Call:
#> densityratio::ulsif(df_numerator = x2, df_denominator = x1, nlambda = 10,     centers = x2, parallel = TRUE, nthreads = 18)
#> 
#> Kernel Information:
#>   Kernel type: Gaussian with L2 norm distances
#>   Number of kernels: 1000
#>   sigma: num [1:10] 0.0507 0.1569 0.2705 0.3948 0.5306 ...
#>   lambda: num [1:10] 1000 215.44 46.42 10 2.15 ...
#> 
#> Optimal sigma: 0.05066591
#> Optimal lambda: 0.1
#> Optimal kernel weights (loocv): num [1:1001] 4.1144 -0.0628 -0.0392 -0.0361 0.6342 ...
#> 

plot(x2, predict(fit3, x2) |> log(), ylim = c(-8, 8))
#> Warning in log(predict(fit3, x2)): NaNs produced
points(x2, predict(fit4, x2) |> log(), col = "blue")
#> Warning in log(predict(fit4, x2)): NaNs produced

^{Created on 2024-03-07 with reprex v2.0.2}

At the same time, when obtaining the centers from the data with largest support yields a better estimate of the density ratio in the regions where the data with smallest support has no observations. That is, in the regions below $0.5$ and above $1.5$, the density ratio is estimated to be $50$ and $1/50$ depending on whether $x2$ appears in the numerator or denominator, respectively. More important even, the density ratio appears to be balanced, in the sense that the estimated density ratio $r(x) = p(x1)/p(x2)$ seems approximately equal to the density ratio $1/r(x) = p(x2)/p(x1)$, and thus the choice of which sample to put in the numerator or denominator does not seem to affect the outcome much.

N.B. When creating the same figures, but then using the numerator samples to create the predictions, would yield the same figures but then cut off at the 0.5 and 1.5 on the x-axis.

thomvolker/centers_dre.md

Where should the Gaussian centers come from in density ratio estimation?