Skip to content

Instantly share code, notes, and snippets.

View thomvolker's full-sized avatar

Thom Volker thomvolker

View GitHub Profile

Density ratio estimation through Bregman divergence optimization

Thom Benjamin Volker


The density ratio estimation problem is to estimate the ratio of two probability density functions $p(x)$ and $q(x)$ from samples $\{nu_i\}_{i=1}^n$ and $\{de_i\}_{i=1}^m$ drawn from $p_{nu}(x)$ and $p_{de}(x)$, respectively. The density ratio estimation problem is important in many machine learning applications, such as domain adaptation, covariate shift,

An iterative simple linear regression procedure

Regression is known to estimate regression coefficients conditional on the other coefficients in the model. One way to show this is using the following script, which estimates coefficients in an iterative manner. Specifically, we perform regression on the residuals that are based on all other variables that are estimated previously. To do this, we initialize a response vector $r_\text{old}$ that equals the observed outcome variable, and we initialize a vector of regression coefficients $b^{(0)}$ to a zero-vector of length $P$. Then, starting with the first variable $X_1$, we update the regression coefficient by regressing $r_\text{old}$ on $X_1$, and defining the regression coefficient as $b_1^{(t)} = b_1^{(t-1)} + \sigma_{X_1, r_\text{old}}$ and we update the residual $r_\text{new}$ as $r\text{old} - X_1 * b_1^{(t)}$. For the next variable, we apply the same

@thomvolker
thomvolker / spectral_dre_svd.md
Created March 30, 2024 13:47
Spectral density ratio estimation with singular value decomposition

Spectral density ratio estimation

Izbicki, Lee & Schafer (2014) show that the density ratio can be computed through a eigen decomposition of the kernel gram matrix. However, for a data set of size $n$, the kernel Gram matrix takes on dimensions $n \times n$ and an eigen decomposition has complexity $\mathcal{O}(n^3)$. Fortunately, it is possible to approximate the solution with a subset of the kernel Gram matrix. That is, we can perform an eigen decomposition of a subset of $k \geq n$ rows and columns of the kernel Gram matrix to approximate the original solution.

library(ggplot2)

Where should the Gaussian centers come from in density ratio estimation?

Sugiyama, Suzuki & Kanamori (2012) advise to use (a subset from) the numerator samples in density ratio estimation. However, this advise might very much depend on the use-case. Namely, the density ratio can be estimated accurately in some region only if there are centers located in that region. In the figures below, we have two datasets sampled from two functions with different support. If the centers are chosen from the population with smallest support, the regularization seems to have a stronger effect.

@thomvolker
thomvolker / ci_overlap.md
Last active March 7, 2024 13:14
ci_overlap

What is the expected confidence interval overlap under a correct synthesis model?

ci_overlap <- function(obs_l, obs_u, syn_l, syn_u) {
  obs_ol <- (min(obs_u, syn_u) - max(obs_l, syn_l)) / (obs_u - obs_l)
  syn_ol <- (min(obs_u, syn_u) - max(obs_l, syn_l)) / (syn_u - syn_l)
  (obs_ol + syn_ol) / 2
}

set.seed(123)
@thomvolker
thomvolker / dr-intercept.md
Last active January 11, 2024 09:51
Density ratio estimation with and without an intercept

Density ratio estimation with and without intercept

When performing density ratio estimation, the regularization parameter $\lambda$ causes the estimated kernel weights to be shrunken towards zero. However, a ratio is least complex when it is one, rather than zero. Hence, the regularization yields a bias that implies more mass in the denominator samples (compared) to the numerator samples. By adding an intercept, we can mitigate the bias, but not completely remove it. This is the case because the intercept is also regularized, and hence shrunken towards zero.

@thomvolker
thomvolker / binomial_dp.md
Created December 17, 2023 11:18
Differential privacy of a proportion
# binomial differential privacy

obs <- rbinom(100, 1, 0.5)
mean(obs)
#> [1] 0.51

rlaplace <- function(n, sensitivity, epsilon) {
  u <- runif(n, -0.5, 0.5)
  b <- sensitivity / epsilon
@thomvolker
thomvolker / svd.md
Last active April 19, 2024 15:36
Simple implementation of singular value decomposition

Singular value decomposition

Thom Benjamin Volker

The singular value decomposition of a $n \times p$ matrix $\boldsymbol{X}$ is a factorization of the form $$\boldsymbol{X} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V^\top},$$ where $\boldsymbol{U}$ is a $n \times p$ semi-orthogonal matrix with the left singular vectors, $\boldsymbol{\Sigma}$ is a $p \times p$ diagonal matrix with non-negative real numbers on the diagonal such that $\sigma_{1,1} \geq \sigma_{2,2} \geq \dots \geq \sigma_{p,p} \geq 0$

@thomvolker
thomvolker / cpd_ulsif.md
Last active October 30, 2023 22:21
# Change-point detection in time-series using the `R`-package `densityratio`

Introduction

In this gist, we replicate two simulations by (Liu et al. 2013), and show how the R-package densityratio can be used for change-point detection in time-series data. To do so, we first load the required packages, and set the future environment to do parallel processing.

library(furrr)
#&gt; Loading required package: future
@thomvolker
thomvolker / densityratioweighting.md
Created September 18, 2023 15:36
Density ratio estimation for weighted analyses

How density ratio estimation can be just to reweight a sample

Density ratio estimation can be used to construct weights for analyses, that is, if the sample of interest does not correspond to the target population in some way, one can reweight the sample of interest such that it is closer to the target population. One way to do that is by estimating the density ratio $$r(x) = \frac{p_{\text{nu}}(x)}{p_{\text{de}}(x)}$$ between the two samples, and reweight the analyses of the target sample $x_{\text{de}}$ with the estimated weights $\hat{r}(x)$. If the analyses of interest is least-squares regression, the solution to the inference problem is given by the optimization problem $$(\mathbf{X^\top W X)\hat{\beta}} = \mathbf{X^\top W y},$$