This paper presents an innovative approach to enhancing the Current Population Survey (CPS) with tax information from the IRS Public Use File (PUF). The methodology is technically sophisticated and addresses a real need in the US tax policy microsimulation community. However, I have several concerns that should be addressed before publication.
This paper presents a methodologically rigorous approach to combining survey and administrative data for microsimulation purposes. The technical innovation is impressive, particularly the use of over 7,000 calibration targets. However, from a distributional analysis perspective, I have significant concerns about the poverty measurement results and the treatment of transfers and in-kind benefits.
This paper presents an impressive technical achievement in combining survey and administrative data through machine learning methods. The scale of the calibration exercise (over 7,000 targets) and the sophisticated use of quantile regression forests represent significant advances in microsimulation methodology. From a European perspective, where we have extensive experience with data fusion and reweighting in models like EUROMOD and MIDAS, I offer the following observations.
We thank the three referees for their thorough and constructive reviews. Their insights have significantly improved the paper. Below we address each major concern raised.
We acknowledge the 9-year gap is substantial, particularly given the TCJA. We have added Section 3.2.2 "Addressing the Temporal Gap" which explains our mitigation strategies:
- Variable-specific uprating using SOI growth factors
We thank the reviewers for their careful reading and constructive feedback. We have substantially revised the manuscript to address their concerns.
Comment: "The use of 2015 PUF data to enhance 2024 CPS raises serious temporal consistency issues..."
Response: We acknowledge this limitation and have added discussion in Section 4.2 (Limitations). The 2015 PUF remains the most recent publicly available tax microdata. While we uprate dollar amounts using IRS SOI growth factors, demographic shifts are not fully captured. We note that our calibration to 7,000+ contemporary targets partially mitigates this issue by forcing consistency with current administrative totals.
This analysis compares PolicyEngine's implementation of New Hampshire's Interest and Dividends tax with TAXSIM 35, including validation across multiple years.
- PolicyEngine correctly implements NH tax based on actual tax forms and statutes
- TAXSIM has a bug that overstates exemptions by 2.6-3.3x
- TAXSIM only supports through 2023 (cannot validate 2024)
- The bug causes TAXSIM to undertax NH residents by hundreds of dollars
Referee Reports for "Enhanced CPS: A Validated Dataset Combining Survey and Administrative Data for Policy Analysis"
This paper presents a methodology for creating an enhanced dataset that combines the Current Population Survey (CPS) with IRS administrative data. While the contribution is valuable for the microsimulation community, several aspects require clarification and improvement.
- Validation Against Tax Policy Benchmarks: The paper claims the dataset is suitable for tax policy analysis but provides limited validation against known tax policy benchmarks. Please include:
We thank all four reviewers for their thoughtful and constructive feedback on our manuscript "Enhancing the Current Population Survey for Policy Analysis: A Methodological Approach". We have carefully addressed each concern raised and made substantial improvements to both the paper and codebase. Below we provide a detailed response to each reviewer's comments.
Reviewer: "The paper provides limited validation of tax-related variables beyond aggregate totals. For policy analysis, it's crucial to understand how well the enhanced dataset captures effective tax rates across the income distribution."
Response: We have added comprehensive tax validation analysis in validation/tax_policy_validation.py that: