Referee Reports for "Enhanced CPS: A Validated Dataset Combining Survey and Administrative Data for Policy Analysis"
This paper presents a methodology for creating an enhanced dataset that combines the Current Population Survey (CPS) with IRS administrative data. While the contribution is valuable for the microsimulation community, several aspects require clarification and improvement.
-
Validation Against Tax Policy Benchmarks: The paper claims the dataset is suitable for tax policy analysis but provides limited validation against known tax policy benchmarks. Please include:
- Comparison of effective tax rates by income decile against CBO or TPC estimates
- Validation of major tax expenditure totals (mortgage interest, charitable deductions)
- Analysis of how well the dataset captures tax filing behavior at different income levels
-
High-Income Representation: The treatment of high-income taxpayers needs more detail:
- How does the imputation handle the PUF's topcoding?
- What is the impact on revenue estimates for policies affecting high earners?
- Include sensitivity analysis for different imputation approaches at the top
-
State Tax Modeling: The paper mentions state identifiers but doesn't validate state-level tax calculations:
- Provide validation against state revenue statistics
- Discuss limitations for state tax policy analysis
- Compare state income tax liabilities against administrative totals
- Table 1 should include more details on each model's treatment of capital gains
- The SALT calculation methodology needs more detail on the interaction with AMT
- Clarify how the dataset handles non-filers and their imputed tax liabilities
The authors present an ambitious data fusion approach combining survey and administrative sources. However, the statistical methodology raises several concerns that must be addressed before publication.
-
Common Support Assumption: The imputation relies on only seven variables available in both datasets. This is concerning:
- Provide diagnostics showing overlap in the covariate distributions
- Discuss potential bias from limited common support
- Include robustness checks using alternative predictor sets
-
Quantile Regression Forest Validation: The QRF methodology needs more rigorous validation:
- Provide out-of-sample prediction accuracy metrics
- Compare QRF performance to simpler methods (hot-deck, regression)
- Show that the joint distribution is preserved, not just marginals
-
Weight Calibration Diagnostics: The reweighting methodology needs additional diagnostics:
- Distribution of weight adjustment factors
- Effective sample size after reweighting
- Stability of estimates across different starting weights
- Clarify why 5% dropout was chosen - show sensitivity to this parameter
- The convergence criterion (0.001% change) seems arbitrary - justify
- Discuss potential mode effects between CPS interviews and tax filing
This paper makes an important contribution by enabling joint analysis of tax and transfer programs. However, the treatment of transfer programs requires substantial improvement.
-
Benefit Underreporting: The paper doesn't address known underreporting of benefits in CPS:
- How does this affect the calibration targets?
- What is the impact on poverty measurement?
- Consider incorporating administrative benefit data
-
Program Interaction Modeling: The interaction between programs needs validation:
- Validate joint participation in multiple programs
- Check benefit cliff effects are preserved
- Compare effective marginal tax rates against other estimates
-
Geographic Variation: State variation in transfer programs isn't addressed:
- Validate state-level SNAP and Medicaid totals
- Discuss limitations for state benefit policy analysis
- Include state-specific TANF in the methodology
- The WIC modeling seems oversimplified - expand or note limitations
- Healthcare subsidies (ACA) need more detailed validation
- Child care subsidies are missing from the benefit programs listed
I attempted to reproduce the results presented in this paper using the provided code repository. While the authors have made commendable efforts toward reproducibility, several critical issues prevent full replication of the results.
-
Environment Setup (PARTIAL SUCCESS):
- Python environment setup succeeded with requirements.txt
- However, undocumented system dependencies caused initial failures:
- LaTeX installation required but not mentioned
- Specific Python 3.9-3.11 requirement discovered through trial
- Memory requirements (32GB+) not documented
-
Data Generation (FAILED):
make datacommand fails due to missing credentials:Error: POLICYENGINE_GITHUB_MICRODATA_AUTH_TOKEN not found- No documentation on obtaining necessary API tokens
- Raw data files not accessible without authentication
- No sample/synthetic data provided for testing
-
Validation Results (NOT REPRODUCIBLE):
- Table values in paper show "[TBC]" placeholders
- Scripts exist but cannot run without generated data
- No intermediate outputs provided for verification
- Validation dashboard link returns 404
-
Data Access Barriers:
- Raw PUF data requires IRS approval (not mentioned in paper)
- CPS data download requires specific year/version info
- No data preservation strategy (DOI, checksums)
- API tokens needed but process undocumented
-
Computational Requirements:
- Full pipeline takes 6+ hours (not mentioned)
- Memory requirements exceed typical systems
- No options for subset/testing runs
- No cloud compute instructions provided
-
Version Control Issues:
- Multiple dependency version conflicts
- PolicyEngine package versions not pinned
- No Docker/container option
- Build tested only on MacOS (Linux fails)
-
Documentation Gaps:
- No step-by-step reproduction guide
- Missing troubleshooting section
- Incomplete API documentation
- No expected output examples
- Random seeds not set in all stochastic processes
- Hardcoded paths in some scripts
- No continuous integration for reproduction
- Missing checksums for output validation
- Provide synthetic/sample data for testing
- Create Docker container with full environment
- Document all credentials/access requirements
- Add smoke tests with expected outputs
- Include computation time/resource estimates
- Create detailed reproduction guide
- Deposit snapshot at Zenodo with DOI
The paper's contribution is valuable, but it cannot be considered reproducible in its current state. Full reproduction requires addressing data access, documentation, and computational environment issues.