Using Administrative Data to Impute Income Nonresponse in Household Surveys

Working Paper: NBER ID: w30420

Authors: V. Kerry Smith; Michael P. Welsh; Richard Carson; Stanley Presser

Abstract: Income is simultaneously one of the most important variables used by economists and the variable most likely to be missing due to item non-response. While observations that are missing income responses are often dropped from analyses, such treatment is usually inappropriate. More appropriate solutions rely on imputation based on either covariates (e.g., age and education) measured in the survey or on spatial estimates (most often for zip codes) from the American Community Survey. We describe a new spatially-based alternative using publicly available Internal Revenue Service tax data that allows estimates of zip code’s income distribution.

Keywords: income imputation; nonresponse; household surveys; IRS data; spatial estimates

JEL Codes: C0; C8


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
Traditional methods of handling missing income data (C81)inappropriate imputation (C36)
IRS tax data (H26)more accurate imputation of missing income values (J17)
IRS data has less item missing data (H26)higher quality source for imputation (C59)
IRS data (H26)unbiased estimates of reported income (C51)
Using IRS data (H26)address biases introduced by unit nonresponse (C83)

Back to index