Skip to content

Instantly share code, notes, and snippets.

@MaxGhenis
Created July 28, 2025 16:23
Show Gist options
  • Save MaxGhenis/6a6edb19a44e2c9ba464a65ab2714257 to your computer and use it in GitHub Desktop.
Save MaxGhenis/6a6edb19a44e2c9ba464a65ab2714257 to your computer and use it in GitHub Desktop.
Referee Reports for Enhanced CPS Paper

Referee Reports for "Enhanced CPS: A Validated Dataset Combining Survey and Administrative Data for Policy Analysis"

Referee 1: Tax Policy Expert

Summary

This paper presents a methodology for creating an enhanced dataset that combines the Current Population Survey (CPS) with IRS administrative data. While the contribution is valuable for the microsimulation community, several aspects require clarification and improvement.

Major Comments

  1. Validation Against Tax Policy Benchmarks: The paper claims the dataset is suitable for tax policy analysis but provides limited validation against known tax policy benchmarks. Please include:

    • Comparison of effective tax rates by income decile against CBO or TPC estimates
    • Validation of major tax expenditure totals (mortgage interest, charitable deductions)
    • Analysis of how well the dataset captures tax filing behavior at different income levels
  2. High-Income Representation: The treatment of high-income taxpayers needs more detail:

    • How does the imputation handle the PUF's topcoding?
    • What is the impact on revenue estimates for policies affecting high earners?
    • Include sensitivity analysis for different imputation approaches at the top
  3. State Tax Modeling: The paper mentions state identifiers but doesn't validate state-level tax calculations:

    • Provide validation against state revenue statistics
    • Discuss limitations for state tax policy analysis
    • Compare state income tax liabilities against administrative totals

Minor Comments

  • Table 1 should include more details on each model's treatment of capital gains
  • The SALT calculation methodology needs more detail on the interaction with AMT
  • Clarify how the dataset handles non-filers and their imputed tax liabilities

Referee 2: Survey Methodology Specialist

Summary

The authors present an ambitious data fusion approach combining survey and administrative sources. However, the statistical methodology raises several concerns that must be addressed before publication.

Major Comments

  1. Common Support Assumption: The imputation relies on only seven variables available in both datasets. This is concerning:

    • Provide diagnostics showing overlap in the covariate distributions
    • Discuss potential bias from limited common support
    • Include robustness checks using alternative predictor sets
  2. Quantile Regression Forest Validation: The QRF methodology needs more rigorous validation:

    • Provide out-of-sample prediction accuracy metrics
    • Compare QRF performance to simpler methods (hot-deck, regression)
    • Show that the joint distribution is preserved, not just marginals
  3. Weight Calibration Diagnostics: The reweighting methodology needs additional diagnostics:

    • Distribution of weight adjustment factors
    • Effective sample size after reweighting
    • Stability of estimates across different starting weights

Minor Comments

  • Clarify why 5% dropout was chosen - show sensitivity to this parameter
  • The convergence criterion (0.001% change) seems arbitrary - justify
  • Discuss potential mode effects between CPS interviews and tax filing

Referee 3: Transfer Program Researcher

Summary

This paper makes an important contribution by enabling joint analysis of tax and transfer programs. However, the treatment of transfer programs requires substantial improvement.

Major Comments

  1. Benefit Underreporting: The paper doesn't address known underreporting of benefits in CPS:

    • How does this affect the calibration targets?
    • What is the impact on poverty measurement?
    • Consider incorporating administrative benefit data
  2. Program Interaction Modeling: The interaction between programs needs validation:

    • Validate joint participation in multiple programs
    • Check benefit cliff effects are preserved
    • Compare effective marginal tax rates against other estimates
  3. Geographic Variation: State variation in transfer programs isn't addressed:

    • Validate state-level SNAP and Medicaid totals
    • Discuss limitations for state benefit policy analysis
    • Include state-specific TANF in the methodology

Minor Comments

  • The WIC modeling seems oversimplified - expand or note limitations
  • Healthcare subsidies (ACA) need more detailed validation
  • Child care subsidies are missing from the benefit programs listed

Referee 4: Computational Reproducibility Expert

Summary

I attempted to reproduce the results presented in this paper using the provided code repository. While the authors have made commendable efforts toward reproducibility, several critical issues prevent full replication of the results.

Reproduction Attempt Results

  1. Environment Setup (PARTIAL SUCCESS):

    • Python environment setup succeeded with requirements.txt
    • However, undocumented system dependencies caused initial failures:
      • LaTeX installation required but not mentioned
      • Specific Python 3.9-3.11 requirement discovered through trial
      • Memory requirements (32GB+) not documented
  2. Data Generation (FAILED):

    • make data command fails due to missing credentials:
      Error: POLICYENGINE_GITHUB_MICRODATA_AUTH_TOKEN not found
      
    • No documentation on obtaining necessary API tokens
    • Raw data files not accessible without authentication
    • No sample/synthetic data provided for testing
  3. Validation Results (NOT REPRODUCIBLE):

    • Table values in paper show "[TBC]" placeholders
    • Scripts exist but cannot run without generated data
    • No intermediate outputs provided for verification
    • Validation dashboard link returns 404

Major Reproducibility Issues

  1. Data Access Barriers:

    • Raw PUF data requires IRS approval (not mentioned in paper)
    • CPS data download requires specific year/version info
    • No data preservation strategy (DOI, checksums)
    • API tokens needed but process undocumented
  2. Computational Requirements:

    • Full pipeline takes 6+ hours (not mentioned)
    • Memory requirements exceed typical systems
    • No options for subset/testing runs
    • No cloud compute instructions provided
  3. Version Control Issues:

    • Multiple dependency version conflicts
    • PolicyEngine package versions not pinned
    • No Docker/container option
    • Build tested only on MacOS (Linux fails)
  4. Documentation Gaps:

    • No step-by-step reproduction guide
    • Missing troubleshooting section
    • Incomplete API documentation
    • No expected output examples

Minor Issues

  • Random seeds not set in all stochastic processes
  • Hardcoded paths in some scripts
  • No continuous integration for reproduction
  • Missing checksums for output validation

Recommendations

  1. Provide synthetic/sample data for testing
  2. Create Docker container with full environment
  3. Document all credentials/access requirements
  4. Add smoke tests with expected outputs
  5. Include computation time/resource estimates
  6. Create detailed reproduction guide
  7. Deposit snapshot at Zenodo with DOI

The paper's contribution is valuable, but it cannot be considered reproducible in its current state. Full reproduction requires addressing data access, documentation, and computational environment issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment