Skip to content

Instantly share code, notes, and snippets.

@contrasting
Last active August 11, 2023 16:10
Show Gist options
  • Select an option

  • Save contrasting/c8f5a8adee9a6ebf82357258d74a0860 to your computer and use it in GitHub Desktop.

Select an option

Save contrasting/c8f5a8adee9a6ebf82357258d74a0860 to your computer and use it in GitHub Desktop.
Matrix form derivation of omitted variable bias

Consider the linear regression model:

$$y = X \beta + e$$

Choose beta to minimise the SSE. The least squares estimator for beta is (derivation omitted):

$$\hat{\beta} = (X'X)^{-1}X'y$$

Now suppose the true form includes an omitted variable in the error:

$$y = X \beta + Z \gamma + \epsilon$$

so in effect $e = Z \gamma + \epsilon$.

Then estimating beta while leaving $Z$ in the error gives:

$$\hat{\beta} = (X'X)^{-1}X'(X \beta + Z \gamma + \epsilon)$$

$$\hat{\beta} = \beta + (X'X)^{-1}X'Z \gamma + (X'X)^{-1}X'\epsilon$$

And so providing $E(\epsilon|X) = 0$,

$$E(\hat{\beta}|X) = \beta + (X'X)^{-1}X'Z \gamma$$

Which means the OVB is only zero when $X$ and $Z$ are uncorrelated.

$\hat{\beta}$ $Cov(X,Z) > 0$ $Cov(X,Z) < 0$
$\gamma > 0$ Overestimate Underestimate
$\gamma < 0$ Underestimate Overestimate

The canonical example is $wage_i = \alpha + \beta * education_i + \gamma * ability_i + \epsilon_i$. Ability is unobserved, and positively correlated with both wage ($\gamma > 0$) and education ($Cov(educ, abil) > 0$). That leads to an overestimate of $\beta$

Intuitively, ability leads to higher wage, but also to better education. If you omit the effect of ability, then education is likely to overstate the impact of education on wage

Another one. Being female leads to lower wage, but also to taking lower paying jobs. If you omit the effect of the latter, then being female overstates the gender pay gap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment