Using Split Samples to Improve Inference about Causal Effects

Working Paper: NBER ID: w21842

Authors: Marcel Fafchamps; Julien Labonne

Abstract: We discuss a method aimed at reducing the risk that spurious results are published. Researchers send their datasets to an independent third party who randomly generates training and testing samples. Researchers perform their analysis on the former and once the paper is accepted for publication the method is applied to the latter and it is those results that are published. Simulations indicate that, under empirically relevant settings, the proposed method significantly reduces type I error and delivers adequate power. The method – that can be combined with pre-analysis plans – reduces the risk that relevant hypotheses are left untested.

Keywords: No keywords provided

JEL Codes: C12; C18

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
split sample method (C83)	reduction of Type I errors (C52)
split sample method (C83)	reduction of publication bias (C46)
randomly splitting dataset (C55)	improvement of reliability of findings (C90)
training sample (C83)	hypothesis refinement (C90)
hypothesis refinement (C90)	more accurate testing of hypotheses in testing sample (C12)
split sample method (C83)	detection of effect sizes greater than 0.2 standard deviations (C90)
sample sizes exceeding 3000 (C55)	sufficient power (above 80%) (L94)
sample sizes of 10,000 or more (C55)	detection of smaller effect sizes (0.1 standard deviations) (C90)
split sample approach (C90)	identify null hypotheses that should be rejected (C12)

Back to index