Last active
July 9, 2024 20:25
-
-
Save ngreifer/4dcee32b9ff0800723ca9e556d6c5cbb to your computer and use it in GitHub Desktop.
Implements the subclass splitting algorithm described by Imbens & Rubin (2015, Sec 13.5)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Implements the subclass splitting algorithm described by Imbens & Rubin (2015, Sec 13.5) | |
# Arguments: | |
# - ps: a vector of (linearized) propensity scores | |
# - z: a vector of treatment status (2 values, doesn't have to be 0/1) | |
# - tmax: the threshold of the t-statistic used to determine whether imbalance remains and | |
# s plit should be formed. High values make splits less likely. | |
# - minn: the minimum number of units of each treatment group allowed in each subclass | |
# - focal: the treatment group where the subclass-wise median ps is computed; leave | |
# NULL to use the full sample | |
# | |
# Output: a factor variable containing subclass membership | |
subclass_split <- function(ps, z, tmax = 1, minn = 3, focal = NULL) { | |
k <- 1 | |
s <- rep(k, length(z)) | |
zz <- z[1] | |
improved <- TRUE | |
while (improved) { | |
improved <- FALSE | |
for (i in 1:k) { | |
split <- TRUE | |
m <- if (is.null(focal)) median(ps[s==i]) else median(ps[s==i & z == focal]) | |
#Check subclass size | |
if (!all(c(sum(s == i & z == zz & ps < m), | |
sum(s == i & z == zz & ps >= m), | |
sum(s == i & z != zz & ps < m), | |
sum(s == i & z != zz & ps >= m)) >= minn)) | |
split <- FALSE | |
#Check independence | |
if (split) { | |
l0 = mean(ps[z == zz & s == i]) | |
l1 = mean(ps[z != zz & s == i]) | |
v = (1/(sum(s == i) - 2))*(sum((ps[z == zz & s == i] - l0)^2) + sum((ps[z!= zz & s == i] - l1)^2)) | |
t = abs(l1 - l0)/sqrt(v*(1/sum(z == zz & s == i) + 1/sum(z != zz & s == i))) | |
if (t <= tmax) split <- FALSE | |
} | |
if (split) { | |
s[s==i & ps >= m] <- k | |
k <- k + 1 | |
improved <- TRUE | |
} | |
} | |
} | |
s <- factor(s) | |
levels(s) <- 1:(k - 1) | |
return(s) | |
} |
Good find, thank you.
One more: Line 54 should say levels(s) <- 1:(k-1)
I actually use this from time to time. Works really nicely; perhaps worth a short paper evaluating the performance.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks! I believe Line 33 has a typo, change to: sum(s == i & z != zz & ps >= m)) >= minn))