Working Paper: NBER ID: w30861
Authors: Jonathan Proctor; Tamma Carleton; Sandy Sum
Abstract: Remotely sensed measurements and other machine learning predictions are increasingly used in place of direct observations in empirical analyses. Errors in such measures may bias parameter estimation, but it remains unclear how large such biases are or how to correct for them. We leverage a new benchmark dataset providing co-located ground truth observations and remotely sensed measurements for multiple variables across the contiguous U.S. to show that the common practice of using remotely sensed measurements without correction leads to biased parameter point estimates and standard errors across a diversity of empirical settings. More than three-quarters of the 95% confidence intervals we estimate using remotely sensed measurements do not contain the true coefficient of interest. These biases result from both classical measurement error and more structured measurement error, which we find is common in machine learning based remotely sensed measurements. We show that multiple imputation, a standard statistical imputation technique so far untested in this setting, effectively reduces bias and improves statistical coverage with only minor reductions in power in both simple linear regression and panel fixed effects frameworks. Our results demonstrate that multiple imputation is a generalizable and easily implementable method for correcting parameter estimates relying on remotely sensed variables.
Keywords: No keywords provided
JEL Codes: C18; C45; C80; Q0
Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.
Cause | Effect |
---|---|
remotely sensed measurements without correction (C20) | biased parameter point estimates (C51) |
classical measurement error (C20) | biased parameter point estimates (C51) |
structured measurement error (C20) | biased parameter point estimates (C51) |
measurement error (C20) | statistical uncertainty in regression analyses (C29) |
multiple imputation (C30) | reduced bias (C46) |
multiple imputation (C30) | improved statistical coverage (C80) |