Created
May 25, 2025 01:20
-
-
Save albertbuchard/43231bb47a076594f36514e5b4dbc1ef to your computer and use it in GitHub Desktop.
Entropy-Balanced IPAW: Efficient implementation of ridge-penalised entropy balancing applied to inverse probability of attrition weights (IPAW), ensuring finite, non-negative weights for longitudinal studies.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
import cvxpy as cp | |
from typing import Sequence, Union | |
def entropy_balance_ipaw( | |
df: pd.DataFrame, | |
*, | |
baseline_covariates: Union[Sequence[str], None] = None, | |
base_weight_col: str = "ipaw_true", | |
session_col: str = "session", | |
baseline_session: int = 1, | |
ridge: float = 1e-3, # L₂ penalty on imbalance | |
out_col: str = "ipaw_ebal", | |
solver: str = "ECOS", | |
) -> pd.DataFrame: | |
""" | |
Ridge-penalised entropy balancing of existing IPAW weights. | |
Guarantees finite, non-negative weights and never returns NaN. | |
Objective (per session s ≠ baseline): | |
minimise Σ_i KL(w_i || w0_i) + ridge · || Zᵀ w − μ₀ Σ w0 ||² | |
subject to Σ_i w_i = Σ_i w0_i | |
w_i ≥ 1e-8 · mean(w0) (numeric lower bound) | |
""" | |
if baseline_covariates is None: | |
baseline_covariates = ("age", "sex") | |
df = df.copy() | |
# baseline (session = baseline_session) means | |
mu0 = ( | |
df.loc[df[session_col] == baseline_session, baseline_covariates] | |
.mean() | |
.to_numpy() | |
) | |
new_w = np.empty(len(df), dtype=float) | |
for s, g in df.groupby(session_col, sort=True): | |
idx = g.index | |
w0 = g[base_weight_col].to_numpy() | |
# keep baseline weights unchanged | |
if s == baseline_session: | |
new_w[idx] = w0 | |
continue | |
Z = g[list(baseline_covariates)].to_numpy(float) | |
n = len(w0) | |
# numeric lower bound prevents under-flow when w0 is very small | |
lb = 1e-8 * w0.mean() | |
w = cp.Variable(n, nonneg=True) | |
imbalance = Z.T @ w - mu0 * w0.sum() | |
obj = cp.Minimize( | |
cp.sum(cp.rel_entr(w, w0)) + ridge * cp.sum_squares(imbalance) | |
) | |
constraints = [cp.sum(w) == w0.sum(), w >= lb] | |
prob = cp.Problem(obj, constraints) | |
prob.solve(solver=solver, verbose=False) | |
# ── graceful fall-backs ──────────────────────────────────────── | |
if prob.status not in ("optimal", "optimal_inaccurate") or w.value is None: | |
new_w[idx] = w0 | |
continue | |
# OPTIONAL: rerun with exact equality when the ridge solution | |
# already hits the constraints up to machine precision | |
if imbalance.value is not None and np.linalg.norm(imbalance.value) < 1e-10: | |
constraints[0] = Z.T @ w == mu0 * w0.sum() | |
prob_eq = cp.Problem(cp.Minimize(cp.sum(cp.rel_entr(w, w0))), constraints) | |
prob_eq.solve(solver=solver, verbose=False) | |
if ( | |
prob_eq.status in ("optimal", "optimal_inaccurate") | |
and w.value is not None | |
): | |
new_w[idx] = w.value | |
continue | |
new_w[idx] = w.value | |
df[out_col] = new_w | |
return df |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Entropy-Balanced IPAW: Causal Considerations
This function,
entropy_balance_ipaw
, refines an existing set of Inverse Probability of Attrition Weights (IPAW) using ridge-penalized entropy balancing. While entropy balancing can improve covariate balance and produce numerically stable weights, its impact on the causal guarantees of IPAW (specifically, blocking backdoor paths between attrition and an outcome) depends crucially on how it's applied.How IPAW Blocks Backdoor Paths
Standard IPAW aims to create a pseudo-population where confounders are balanced across different levels of "treatment" (i.e., remaining in the study versus attriting). The original weights ($w^{(0)}$ , typically
base_weight_col
in this function) are derived from a model:Here,$L$ represents the complete set of confounders that create backdoor paths between attrition and the outcome of interest. If $L$ is correctly and fully specified in this initial attrition model, and positivity holds, then weighting by $w^{(0)}$ ensures conditional exchangeability: the potential outcomes $Y(*)$ are independent of attrition $R$ given $L$ . This blocks the backdoor paths.
Entropy Balancing and Causal Guarantees
The$w^{(0)}$ to new weights $w^{(*)}$ by minimizing the KL divergence from $w^{(0)}$ while enforcing (penalized) balance on a specified set of
entropy_balance_ipaw
function adjustsbaseline_covariates
.1. When Causal Protection is Preserved ("Guaranteed Safe")
The crucial insight is that if the entropy balancing procedure constrains or models exactly the same set of confounders $L$ (or a strict superset of$L$ ) that were used to generate the original $w^{(0)}$ weights, the backdoor path blocking property is preserved.
baseline_covariates
is identical tobaseline_covariates
parameter inentropy_balance_ipaw
must include all variables that were part of the original confounder setbase_weight_col
.2. How Causal Protection Can Be Accidentally Compromised
Bias can be inadvertently re-introduced if care is not taken:
baseline_covariates
is only a subset of the original✦ If
✦ If aggressive re-weighting occurs for
baseline_covariates
3. Best Practices for Causal Safety with
entropy_balance_ipaw
To leverage the benefits of entropy balancing (like improved empirical balance and numerical stability) without compromising causal inference:
base_weight_col
includes all plausible pre-attrition confounders (base_weight_col
on all covariates inentropy_balance_ipaw
, setbaseline_covariates
to be identical to the full setbase_weight_col
. Do not omit covariates from this set, even if they appeared balanced byridge
parameter helps prevent extreme weights, especially with smaller sample sizes or when perfect balance is hard to achieve. This is generally safer than forcing exact balance.baseline_covariates
.Bottom Line for
entropy_balance_ipaw
baseline_covariates
in this function is set to the complete set of confoundersbase_weight_col
), the resultingipaw_ebal
weights preserve the backdoor path blocking properties of the original IPAW. This function then serves to potentially improve model-robustness and variance control for the estimation of causal effects.baseline_covariates
represents only a subset ofThis function offers a powerful way to refine weights, but its application in a causal inference context requires careful consideration of the confounder set$L$ .