Referee Reports for "Enhanced CPS: A Validated Dataset Combining Survey and Administrative Data for Policy Analysis"

Referee 1: Tax Policy Expert

Summary

This paper presents a methodology for creating an enhanced dataset that combines the Current Population Survey (CPS) with IRS administrative data. While the contribution is valuable for the microsimulation community, several aspects require clarification and improvement.

Major Comments

Validation Against Tax Policy Benchmarks: The paper claims the dataset is suitable for tax policy analysis but provides limited validation against known tax policy benchmarks. Please include:
- Comparison of effective tax rates by income decile against CBO or TPC estimates
- Validation of major tax expenditure totals (mortgage interest, charitable deductions)
- Analysis of how well the dataset captures tax filing behavior at different income levels
High-Income Representation: The treatment of high-income taxpayers needs more detail:
- How does the imputation handle the PUF's topcoding?
- What is the impact on revenue estimates for policies affecting high earners?
- Include sensitivity analysis for different imputation approaches at the top
State Tax Modeling: The paper mentions state identifiers but doesn't validate state-level tax calculations:
- Provide validation against state revenue statistics
- Discuss limitations for state tax policy analysis
- Compare state income tax liabilities against administrative totals

Minor Comments

Table 1 should include more details on each model's treatment of capital gains
The SALT calculation methodology needs more detail on the interaction with AMT
Clarify how the dataset handles non-filers and their imputed tax liabilities

Referee 2: Survey Methodology Specialist

Summary

The authors present an ambitious data fusion approach combining survey and administrative sources. However, the statistical methodology raises several concerns that must be addressed before publication.

Major Comments

Common Support Assumption: The imputation relies on only seven variables available in both datasets. This is concerning:
- Provide diagnostics showing overlap in the covariate distributions
- Discuss potential bias from limited common support
- Include robustness checks using alternative predictor sets
Quantile Regression Forest Validation: The QRF methodology needs more rigorous validation:
- Provide out-of-sample prediction accuracy metrics
- Compare QRF performance to simpler methods (hot-deck, regression)
- Show that the joint distribution is preserved, not just marginals
Weight Calibration Diagnostics: The reweighting methodology needs additional diagnostics:
- Distribution of weight adjustment factors
- Effective sample size after reweighting
- Stability of estimates across different starting weights

Minor Comments

Clarify why 5% dropout was chosen - show sensitivity to this parameter
The convergence criterion (0.001% change) seems arbitrary - justify
Discuss potential mode effects between CPS interviews and tax filing

Referee 3: Transfer Program Researcher

Summary

This paper makes an important contribution by enabling joint analysis of tax and transfer programs. However, the treatment of transfer programs requires substantial improvement.

Major Comments

Benefit Underreporting: The paper doesn't address known underreporting of benefits in CPS:
- How does this affect the calibration targets?
- What is the impact on poverty measurement?
- Consider incorporating administrative benefit data
Program Interaction Modeling: The interaction between programs needs validation:
- Validate joint participation in multiple programs
- Check benefit cliff effects are preserved
- Compare effective marginal tax rates against other estimates
Geographic Variation: State variation in transfer programs isn't addressed:
- Validate state-level SNAP and Medicaid totals
- Discuss limitations for state benefit policy analysis
- Include state-specific TANF in the methodology

Minor Comments

The WIC modeling seems oversimplified - expand or note limitations
Healthcare subsidies (ACA) need more detailed validation
Child care subsidies are missing from the benefit programs listed

Referee 4: Computational Reproducibility Expert

Summary

I attempted to reproduce the results presented in this paper using the provided code repository. While the authors have made commendable efforts toward reproducibility, several critical issues prevent full replication of the results.

Reproduction Attempt Results

Environment Setup (PARTIAL SUCCESS):
- Python environment setup succeeded with requirements.txt
- However, undocumented system dependencies caused initial failures:
  - LaTeX installation required but not mentioned
  - Specific Python 3.9-3.11 requirement discovered through trial
  - Memory requirements (32GB+) not documented
Data Generation (FAILED):
- make data command fails due to missing credentials:
```
Error: POLICYENGINE_GITHUB_MICRODATA_AUTH_TOKEN not found
```
- No documentation on obtaining necessary API tokens
- Raw data files not accessible without authentication
- No sample/synthetic data provided for testing
Validation Results (NOT REPRODUCIBLE):
- Table values in paper show "[TBC]" placeholders
- Scripts exist but cannot run without generated data
- No intermediate outputs provided for verification
- Validation dashboard link returns 404

Major Reproducibility Issues

Data Access Barriers:
- Raw PUF data requires IRS approval (not mentioned in paper)
- CPS data download requires specific year/version info
- No data preservation strategy (DOI, checksums)
- API tokens needed but process undocumented
Computational Requirements:
- Full pipeline takes 6+ hours (not mentioned)
- Memory requirements exceed typical systems
- No options for subset/testing runs
- No cloud compute instructions provided
Version Control Issues:
- Multiple dependency version conflicts
- PolicyEngine package versions not pinned
- No Docker/container option
- Build tested only on MacOS (Linux fails)
Documentation Gaps:
- No step-by-step reproduction guide
- Missing troubleshooting section
- Incomplete API documentation
- No expected output examples

Minor Issues

Random seeds not set in all stochastic processes
Hardcoded paths in some scripts
No continuous integration for reproduction
Missing checksums for output validation

Recommendations

Provide synthetic/sample data for testing
Create Docker container with full environment
Document all credentials/access requirements
Add smoke tests with expected outputs
Include computation time/resource estimates
Create detailed reproduction guide
Deposit snapshot at Zenodo with DOI

The paper's contribution is valuable, but it cannot be considered reproducible in its current state. Full reproduction requires addressing data access, documentation, and computational environment issues.

MaxGhenis/referee_reports.md

Referee Reports for "Enhanced CPS: A Validated Dataset Combining Survey and Administrative Data for Policy Analysis"

Referee 1: Tax Policy Expert

Summary

Major Comments

Minor Comments

Referee 2: Survey Methodology Specialist

Summary

Major Comments

Minor Comments

Referee 3: Transfer Program Researcher

Summary

Major Comments

Minor Comments

Referee 4: Computational Reproducibility Expert

Summary

Reproduction Attempt Results

Major Reproducibility Issues

Minor Issues

Recommendations