To Hold Out or Not to Hold Out

Working Paper: NBER ID: w19565

Authors: Frank Schorfheide; Kenneth I. Wolpin

Abstract: A recent literature has developed that combines two prominent empirical approaches to ex ante policy evaluation: randomized controlled trials (RCT) and structural estimation. The RCT provides a "gold-standard'' estimate of a particular treatment, but only of that treatment. Structural estimation provides the capability to extrapolate beyond the experimental treatment, but is based on untestable assumptions and is subject to structural data mining. Combining the approaches by holding out from the structural estimation exercise either the treatment or control sample allows for external validation of the underlying behavioral model. Although intuitively appealing, this holdout methodology is not well grounded. For instance, it is easy to show that it is suboptimal from a Bayesian perspective. Using a stylized representation of a randomized controlled trial, we provide a formal rationale for the use of a holdout sample in an environment in which data mining poses an impediment to the implementation of the ideal Bayesian analysis and a numerical illustration of the potential benefits of holdout samples.

Keywords: Holdout Samples; Data Mining; Bayesian Analysis; Policy Evaluation; Structural Estimation

JEL Codes: C11; C31; C52

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
Holdout samples (C24)	External validation of behavioral models (C52)
Holdout samples (C24)	Reduced risk of overfitting and optimistic assessments of model fit (C52)
Holdout mechanism (50% sample, split treatment and control) (C90)	More accurate predictions of treatment effects (C22)
Holdout mechanism (D86)	Discouragement of modelers from inflating predictive fit (C52)
Holdout mechanism (D86)	Lower integrated risk differential when predicting treatment effects (C22)
Holdout mechanism dominates no-holdout mechanism (D86)	More credible policy evaluation (D78)

Back to index