Skip to content

Instantly share code, notes, and snippets.

View ngreifer's full-sized avatar

Noah Greifer ngreifer

View GitHub Profile
@ngreifer
ngreifer / vcovSUEST.R
Created July 9, 2024 20:38
Computes the asymptotic HC0 covariance matrix for multiple models fit to the same data, with cross-model covariances included. These are equivalent to the covariance computed when stacking models using M-estimation.
# Computes joint HC0 covariance matrix of several models fit to the same data.
# `fits` should be a list of model fits (e.g., output of a call to lm or glm, etc.)
# To include models fit to subsets of data, fit models to whole dataset with weights
# close to 0 for units to be excluded. Relies on `sandwich` functionality. Returns
# a symmetric matrix with no dimnames. Individual model covariances are on the block
# diagonals; between-model covariances are on the off-diagonals. See
# https://github.com/kylebutts/vcovSUR for a more mature implementation. See Mize
# et al. (2019) <https://doi.org/10.1177/0081175019852763> for theory and
# application.
@ngreifer
ngreifer / constrained_sample.R
Last active May 29, 2021 23:36
Implements sampling with balance constraints using mixed integer programming
constrained_sample <- function(X, ns = .5*nrow(X), tols = .01, targets = colMeans(X), time = 2*60, solver = "glpk") {
#Arguments
#X - dataset (matrix) from which sample is to be drawn
#ns - maximum size of the resulting sample
#tols - maximum distance between resulting sample means and the targets
#targets - target means for sample means to pursue
#time - number of seconds before aborting optimizer
#solver - which solver to use; "glpk" or "gurobi" (gurobi is better)
#
#Output: a vector of indices of X to retain in the sample
@ngreifer
ngreifer / subclass_split.R
Last active July 9, 2024 20:25
Implements the subclass splitting algorithm described by Imbens & Rubin (2015, Sec 13.5)
# Implements the subclass splitting algorithm described by Imbens & Rubin (2015, Sec 13.5)
# Arguments:
# - ps: a vector of (linearized) propensity scores
# - z: a vector of treatment status (2 values, doesn't have to be 0/1)
# - tmax: the threshold of the t-statistic used to determine whether imbalance remains and
# s plit should be formed. High values make splits less likely.
# - minn: the minimum number of units of each treatment group allowed in each subclass
# - focal: the treatment group where the subclass-wise median ps is computed; leave
# NULL to use the full sample
#