Missing Data in Asset Pricing Panels

Working Paper: NBER ID: w30761

Authors: Joachim Freyberger; Bjrn Hppner; Andreas Neuhierl; Michael Weber

Abstract: Missing data for return predictors is a common problem in cross sectional asset pricing. Most papers do not explicitly discuss how they deal with missing data but conventional treatments focus on the subset of firms with no missing data for any predictor or impute the unconditional mean. Both methods have undesirable properties - they are either inefficient or lead to biased estimators and incorrect inference. We propose a simple and computationally attractive alternative using conditional mean imputations and weighted least squares, cast in a generalized method of moments (GMM) framework. This method allows us to use all observations with observed returns, it results in valid inference, and it can be applied in non-linear and high-dimensional settings. In Monte Carlo simulations, we find that it performs almost as well as the efficient but computationally costly GMM estimator in many cases. We apply our procedure to a large panel of return predictors and find that it leads to improved out-of-sample predictability.

Keywords: missing data; asset pricing; imputation; GMM; predictability

JEL Codes: C14; C58; G12


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
Proposed method of imputing missing covariate observations using conditional mean imputation and weighted least squares (C20)Improved out-of-sample predictability (C53)
Conventional methods (complete case analysis and unconditional mean imputation) (C29)Biased estimators and incorrect inferences (C51)
Complete case analysis (C29)Discards a significant amount of data (C55)
Proposed method (C59)Allows for inclusion of all firms with valid return observations (L20)
Proposed method (C59)Valid inference and better statistical properties (C52)

Back to index