Skip to content

Instantly share code, notes, and snippets.

@mingjiphd
Last active January 27, 2026 18:46
Show Gist options
  • Select an option

  • Save mingjiphd/7aecb1379bc8ddc77e09644db668e1fc to your computer and use it in GitHub Desktop.

Select an option

Save mingjiphd/7aecb1379bc8ddc77e09644db668e1fc to your computer and use it in GitHub Desktop.
Causal Analysis using R Instrument Variables
This R script is a step by step tutorial on how to do instrument variable analysis using R. It contains a brief overview of Instrument Variables including what IVs are, the Two Critical Criteria for IV, Two Stage Least Squares, implementation in R using the ivreg function in the AER package and how to interpret the ivreg outputs.
For a step by step video demonstration, please visit: https://youtu.be/nVDgLE2ILn4
### Causal Analysis using R Instrumental Variables
## Instrumental variables (IVs) are special variables used in statistics and econometrics
## to estimate causal effects when ordinary regression methods would be biased due to
## endogeneity—meaning the explanatory variable is correlated with unobserved factors or the error term.
## Two Key Criterias of IVs
## It is correlated with the problematic (endogenous) explanatory variable.
## It is not correlated with the error term and only affects the outcome through
## the explanatory variable (called the exclusion restriction).
## Instrumental variables (IV) address bias when explanatory variables correlate with error terms,
## causing endogeneity.
## A valid instrument is strongly correlated with the endogenous variable (relevance) but uncorrelated
## with the error term and affects the outcome only through the endogenous variable
## (exclusion restriction).
## IV estimation typically uses two-stage least squares: first predicting the endogenous
## variable with the instrument, then estimating the effect on the outcome.
## IVs correct biases from omitted variables, measurement errors, and simultaneous causality.
## Strong instruments improve estimate precision; weak instruments can lead to unreliable results.
## IVs emulate randomized experiments in observational data to infer causal effects.
## Common instruments come from natural or policy variations unrelated directly to the outcome
## except through the treatment.
### Generate a simulated data set for IV analysis
set.seed(11152025) # For reproducibility
n <- 200 # Sample size
# Instrument variable (Z), randomly assigned
Z <- rbinom(n, 1, 0.5)
# Unobserved confounder (U)
U <- rnorm(n)
# Endogenous explanatory variable X depends on Z and U
X <- 0.8 * Z + 0.5 * U + rnorm(n)
# Outcome variable Y depends on X and U
Y <- 2 * X + U + rnorm(n)
# Create a data frame
iv_data <- data.frame(Y = Y, X = X, Z = Z)
# View first few rows
head(iv_data)
### Perform an Instrument Variable Analysis
# Install and load the AER package if not already installed
if (!require(AER)) {
install.packages("AER")
library(AER)
}
# Assuming 'iv_data' is the dataset with Y, X, and Z
# Perform IV regression with Y as outcome, X as endogenous regressor, and Z as instrument
# using two stage least square (2SLS) estimation
iv_model <- ivreg(Y ~ X | Z, data = iv_data)
# Display summary of IV regression results
summary(iv_model)
## The stimated effect of X on Y after accounting for endogeneity using
## Z as instrument. This coefficient is 1.784 with a standard error of 0.226,
## highly significant (p < 0.001).
For a step by step video demonstration, please visit: https://youtu.be/nVDgLE2ILn4
@mingjiphd
Copy link
Author

For a step by step video demonstration, please visit: https://youtu.be/nVDgLE2ILn4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment