Last active
January 27, 2026 18:43
-
-
Save mingjiphd/48271b44739530fd2778990fe13a8e80 to your computer and use it in GitHub Desktop.
Causal Analysis using R The Difference in Difference Method
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This R script provides a step-by-step walkthrough of how to use R to apply the Difference-in-Differences (DiD) method for causal analysis. It explains the purpose, key assumptions, and interpretation of the DiD approach using the DiD package in R. | |
| A step by step video demonstration can be found at: https://youtu.be/BNBv0pE5owc |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ### Causal Analysis using R The Difference in Difference Method | |
| ##DiD estimates causal effects by comparing changes over time between treatment and control groups. | |
| ##It uses before-and-after differences in outcomes to isolate the treatment effect. | |
| ##The critical assumption is parallel trends, meaning both groups would follow similar outcome paths | |
| #if untreated. | |
| ##Suitable for observational data where randomization is not possible. | |
| ##Controls for unobserved fixed differences between groups but is sensitive to violations of assumptions. | |
| ##Widely applied for policy evaluation and program impact analysis. | |
| # Install and load the did package if not already installed | |
| if (!requireNamespace("did", quietly = TRUE)) { | |
| install.packages("did") | |
| } | |
| library(did) | |
| # Example: Create a panel dataset with multiple groups and time periods | |
| set.seed(11092025) | |
| n <- 100 # individuals per group | |
| time_periods <- 4 # multiple time periods | |
| groups <- 2 # treatment and control groups | |
| id <- rep(1:n, groups * time_periods) | |
| group <- rep(rep(0:1, each = n), time_periods) # control = 0, treated = 1 (treated starting after period 2) | |
| period <- rep(1:time_periods, each = n * groups) | |
| # Create treatment timing variable: treated group gets treatment starting period 3 | |
| G <- ifelse(group == 1, 3, 0) | |
| # Simulate outcome with group, time, and treatment effects | |
| Y <- 5 + 2*group + 1.5*period + 4*(period >= G & G > 0) + rnorm(n * groups * time_periods) | |
| data_did <- data.frame(id = id, period = period, group = group, G = G, Y = Y) | |
| head(data_did) | |
| # Run DID estimation using att_gt (group-time average treatment effects) | |
| #att_gt() stands for "average treatment effect on the treated." More specifically, | |
| #within the att_gt() output, "att" refers to the group-time average treatment effect | |
| #—i.e., the estimated causal impact of the treatment for each group gat each time period t | |
| # after the treatment begins for that group. | |
| att_gt_res <- att_gt( | |
| yname = "Y", | |
| tname = "period", | |
| idname = "id", | |
| gname = "G", | |
| data = data_did, | |
| control_group = "notyettreated", # uses not-yet treated as control for staggered adoption | |
| est_method = "dr", # doubly robust estimation | |
| panel = FALSE | |
| ) | |
| summary(att_gt_res) | |
| ##HoestDiD https://cran.r-project.org/web/packages/HonestDiD/index.html | |
| # Aggregate results by type: dynamic effects over time since treatment | |
| # "dynamic" aggregation computes the average treatment effects | |
| # relative to the length of exposure to treatment (event time). | |
| agg_dynamic <- aggte(att_gt_res, type = "dynamic") | |
| summary(agg_dynamic) | |
| # Aggregate results by group | |
| # aggregates the detailed group-time average treatment effects (ATTs) | |
| # estimated by att_gt() by groups defined by their treatment timing. | |
| agg_group <- aggte(att_gt_res, type = "group") | |
| summary(agg_group) | |
| # Aggregate results by calendar time | |
| # Calendar time is real world clock time in the calendar such as year, meonth or day | |
| # Event time is relative to a specific event or intervention such as 0 period after intervention | |
| # 1 period after the intervention, etc. | |
| agg_calendar <- aggte(att_gt_res, type = "calendar") | |
| summary(agg_calendar) | |
| ##All ATT estimates are the same in this example. | |
| ## They may be different if there are heterogenous treatment effects across group and time | |
| ## or different group sizes, etc. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| A step by step video demonstration can be found at: https://youtu.be/BNBv0pE5owc |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A step by step video demonstration can be found at: https://youtu.be/BNBv0pE5owc