The Accuracy of Tax Imputations: Estimating Tax Liabilities and Credits Using Linked Survey and Administrative Data

Working Paper: NBER ID: w28229

Authors: Bruce D. Meyer; Derek Wu; Grace Finley; Patrick Langetieg; Carla Medalia; Mark Payne; Alan Plumley

Abstract: This paper calculates accurate estimates of income and payroll taxes using a groundbreaking set of linked survey and administrative tax data that are part of the Comprehensive Income Dataset (CID). We compare our estimates to survey imputations produced by the Census Bureau and those generated using the TAXSIM calculator from the National Bureau of Economic Research. The administrative data include two sets of Internal Revenue Service (IRS) data: (1) a limited set of tax information for the population of individual income tax returns covering selected line items from Forms 1040, W-2, and 1099-R; and (2) an extensive set of population tax records processed by the IRS in 2011, covering nearly every line item on Form 1040 and most lines on a series of third-party information returns. We link these IRS records to the Current Population Survey Annual Social and Economic Supplement (CPS) for reference year 2010. We describe how we form tax units and estimate various types of tax liabilities and credits using these linked data, providing a roadmap for constructing accurate measures of taxes while preserving the survey family as the sharing unit for distributional analyses. We find that aggregate estimates of various tax components using the limited and extensive tax data estimates are close to each other and much closer to public IRS tabulations than either of the imputations using survey data alone. At the individual level, the absolute errors of survey-only imputations of federal income taxes and total taxes are on average 10% and 13%, respectively, of adjusted gross income. In contrast, the limited tax data imputations yield mean absolute errors for federal income taxes and total taxes that are about 2% and 3% of adjusted gross income, respectively. For the Earned Income Tax Credit, the limited tax data imputation is off by less than $20 on average for a typical family (compared to more than $500 using either of the survey-only imputations).

Keywords: tax liabilities; tax credits; linked data; survey data; administrative data

JEL Codes: C42; C81; H20; H24; I32


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
aggregate estimates of tax components using both limited and extensive tax data (H29)public IRS tabulations (H20)
survey-only imputations (C83)mean absolute errors of 10% for federal income taxes and 13% for total taxes as a percentage of adjusted gross income (AGI) (H26)
limited tax data imputations (H29)mean absolute errors of approximately 2% and 3% (C12)
limited tax data imputation for the earned income tax credit (EITC) (H26)is off by less than $20 on average (J31)
survey-only imputations for the earned income tax credit (EITC) (C83)are off by over $500 on average (J31)
linking IRS data to the Current Population Survey (CPS) (C81)significant improvement in accuracy when administrative data is included (C80)

Back to index