Working Paper: NBER ID: w30420
Authors: V. Kerry Smith; Michael P. Welsh; Richard Carson; Stanley Presser
Abstract: Income is simultaneously one of the most important variables used by economists and the variable most likely to be missing due to item non-response. While observations that are missing income responses are often dropped from analyses, such treatment is usually inappropriate. More appropriate solutions rely on imputation based on either covariates (e.g., age and education) measured in the survey or on spatial estimates (most often for zip codes) from the American Community Survey. We describe a new spatially-based alternative using publicly available Internal Revenue Service tax data that allows estimates of zip code’s income distribution.
Keywords: income imputation; nonresponse; household surveys; IRS data; spatial estimates
JEL Codes: C0; C8
Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.
Cause | Effect |
---|---|
Traditional methods of handling missing income data (C81) | inappropriate imputation (C36) |
IRS tax data (H26) | more accurate imputation of missing income values (J17) |
IRS data has less item missing data (H26) | higher quality source for imputation (C59) |
IRS data (H26) | unbiased estimates of reported income (C51) |
Using IRS data (H26) | address biases introduced by unit nonresponse (C83) |