Density ratio estimation can be used to construct weights for analyses, that is, if the sample of interest does
not correspond to the target population in some way, one can reweight the sample of interest such that it is
closer to the target population. One way to do that is by estimating the density ratio
In this gist, we replicate two simulations by (Liu et al. 2013), and
show how the R
-package densityratio
can be used for change-point
detection in time-series data. To do so, we first load the required
packages, and set the future
environment to do parallel processing.
library(furrr)
#> Loading required package: future
Thom Benjamin Volker
The singular value decomposition of a
# binomial differential privacy
obs <- rbinom(100, 1, 0.5)
mean(obs)
#> [1] 0.51
rlaplace <- function(n, sensitivity, epsilon) {
u <- runif(n, -0.5, 0.5)
b <- sensitivity / epsilon
When performing density ratio estimation, the regularization parameter
ci_overlap <- function(obs_l, obs_u, syn_l, syn_u) {
obs_ol <- (min(obs_u, syn_u) - max(obs_l, syn_l)) / (obs_u - obs_l)
syn_ol <- (min(obs_u, syn_u) - max(obs_l, syn_l)) / (syn_u - syn_l)
(obs_ol + syn_ol) / 2
}
set.seed(123)
Sugiyama, Suzuki & Kanamori (2012) advise to use (a subset from) the numerator samples in density ratio estimation. However, this advise might very much depend on the use-case. Namely, the density ratio can be estimated accurately in some region only if there are centers located in that region. In the figures below, we have two datasets sampled from two functions with different support. If the centers are chosen from the population with smallest support, the regularization seems to have a stronger effect.
Izbicki, Lee & Schafer (2014) show that the density ratio can be
computed through a eigen decomposition of the kernel gram matrix. However, for a data set of size
library(ggplot2)
Regression is known to estimate regression coefficients conditional on the other coefficients in the model.
One way to show this is using the following script, which estimates coefficients in an iterative manner.
Specifically, we perform regression on the residuals that are based on all other variables that are estimated
previously. To do this, we initialize a response vector
Thom Benjamin Volker
The density ratio estimation problem is to estimate the ratio of two probability density
functions