Sugiyama, Suzuki & Kanamori (2012) advise to use (a subset from) the numerator samples in density ratio estimation. However, this advise might very much depend on the use-case. Namely, the density ratio can be estimated accurately in some region only if there are centers located in that region. In the figures below, we have two datasets sampled from two functions with different support. If the centers are chosen from the population with smallest support, the regularization seems to have a stronger effect.
set.seed(1)
x1 <- runif(1000, 0.5, 1.5)
x2 <- runif(1000, 0, 2)
fit1 <- densityratio::ulsif(x1, x2, centers = x1, nlambda = 10, parallel = TRUE, nthreads = 18)
fit2 <- densityratio::ulsif(x2, x1, centers = x1, nlambda = 10, parallel = TRUE, nthreads = 18)
fit1
#>
#> Call:
#> densityratio::ulsif(df_numerator = x1, df_denominator = x2, nlambda = 10, centers = x1, parallel = TRUE, nthreads = 18)
#>
#> Kernel Information:
#> Kernel type: Gaussian with L2 norm distances
#> Number of kernels: 1000
#> sigma: num [1:10] 0.0252 0.078 0.1343 0.1948 0.2588 ...
#> lambda: num [1:10] 1000 215.44 46.42 10 2.15 ...
#>
#> Optimal sigma: 0.02521476
#> Optimal lambda: 0.1
#> Optimal kernel weights (loocv): num [1:1001] 0.07441 0.02149 0.02503 0.03088 0.00648 ...
#>
fit2
#>
#> Call:
#> densityratio::ulsif(df_numerator = x2, df_denominator = x1, nlambda = 10, centers = x1, parallel = TRUE, nthreads = 18)
#>
#> Kernel Information:
#> Kernel type: Gaussian with L2 norm distances
#> Number of kernels: 1000
#> sigma: num [1:10] 0.0535 0.1588 0.2625 0.3659 0.4676 ...
#> lambda: num [1:10] 1000 215.44 46.42 10 2.15 ...
#>
#> Optimal sigma: 0.05350963
#> Optimal lambda: 0.4641589
#> Optimal kernel weights (loocv): num [1:1001] 1.04422 -0.00134 -0.0057 -0.00423 -0.00718 ...
#>
fit3 <- densityratio::ulsif(x1, x2, centers = x2, nlambda = 10, parallel = TRUE, nthreads = 18)
fit4 <- densityratio::ulsif(x2, x1, centers = x2, nlambda = 10, parallel = TRUE, nthreads = 18)
plot(x2, predict(fit1, x2) |> log(), ylim = c(-8, 8))
points(x2, predict(fit2, x2) |> log(), col = "blue")
Here, the density ratio is shrunken towards
fit3
#>
#> Call:
#> densityratio::ulsif(df_numerator = x1, df_denominator = x2, nlambda = 10, centers = x2, parallel = TRUE, nthreads = 18)
#>
#> Kernel Information:
#> Kernel type: Gaussian with L2 norm distances
#> Number of kernels: 1000
#> sigma: num [1:10] 0.0535 0.1588 0.2625 0.3659 0.4676 ...
#> lambda: num [1:10] 1000 215.44 46.42 10 2.15 ...
#>
#> Optimal sigma: 0.05350963
#> Optimal lambda: 0.4641589
#> Optimal kernel weights (loocv): num [1:1001] 0.19758 0.01927 0.01444 0.01008 -0.00281 ...
#>
fit4
#>
#> Call:
#> densityratio::ulsif(df_numerator = x2, df_denominator = x1, nlambda = 10, centers = x2, parallel = TRUE, nthreads = 18)
#>
#> Kernel Information:
#> Kernel type: Gaussian with L2 norm distances
#> Number of kernels: 1000
#> sigma: num [1:10] 0.0507 0.1569 0.2705 0.3948 0.5306 ...
#> lambda: num [1:10] 1000 215.44 46.42 10 2.15 ...
#>
#> Optimal sigma: 0.05066591
#> Optimal lambda: 0.1
#> Optimal kernel weights (loocv): num [1:1001] 4.1144 -0.0628 -0.0392 -0.0361 0.6342 ...
#>
plot(x2, predict(fit3, x2) |> log(), ylim = c(-8, 8))
#> Warning in log(predict(fit3, x2)): NaNs produced
points(x2, predict(fit4, x2) |> log(), col = "blue")
#> Warning in log(predict(fit4, x2)): NaNs produced
Created on 2024-03-07 with reprex v2.0.2
At the same time, when obtaining the centers from the data with largest support yields
a better estimate of the density ratio in the regions where the data with smallest
support has no observations. That is, in the regions below
N.B. When creating the same figures, but then using the numerator samples to create the predictions, would yield the same figures but then cut off at the 0.5 and 1.5 on the x-axis.