Factorial Designs, Model Selection, and Incorrect Inference in Randomized Experiments

Working Paper: NBER ID: w26562

Authors: Karthik Muralidharan; Mauricio Romero; Kaspar Wuthrich

Abstract: Factorial designs are widely used for studying multiple treatments in one experiment. While t-tests based on the “long” model (including main and interaction effects) provide valid inferences against “business-as-usual” counterfactuals, “short” model t-tests (that ignore interactions) yield higher power if the interactions are zero, but incorrect inferences otherwise. Out of 27 factorial experiments published in top-5 journals in 2007–2017, 19 use the short model. We reanalyze these experiments, and show that over half of their published results lose significance when interactions are included. We show that testing the interactions using the long model and presenting the short model if the interactions are not significantly different from zero leads to incorrect inference due to the implied data-dependent model selection. Based on recent econometric advances, we show that local power improvements over the long model are possible. However, if the main effects are of primary interest, leaving the interaction cells empty yields valid inferences and global power improvements. In addition, the sample size needed to detect interactions is substantially larger than that required to detect main effects, resulting in most experiments being under-powered to detect interactions. Thus, using factorial designs to explore whether interactions are meaningful can be problematic because interaction estimates are likely to considerably overestimate the magnitude of the true effect conditional on being significant.\n

Keywords: Randomized Controlled Trials; Crosscut Designs; Power in Field Experiments; Data Dependent Model Selection; Interaction Effects

JEL Codes: C12; C18; C21; C90; C93


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
short model (Y60)incorrect inferences (D83)
interactions not zero (C69)incorrect inferences (D83)
long model (E17)valid inferences (C20)
sample size required for interactions (C90)larger than main effects (C92)
significant interactions (C31)overestimate magnitude of true effects (C51)
long model (E17)avoid pitfalls associated with short model (C52)
neglecting interactions (C20)underpowered studies (C90)
leaving interaction cells empty (Y90)valid inferences (C20)

Back to index