Multiple Hypothesis Testing in Experimental Economics

Working Paper: NBER ID: w21875

Authors: John A. List; Azeem M. Shaikh; Yang Xu

Abstract: Empiricism in the sciences allows us to test theories, formulate optimal policies, and learn how the world works. In this manner, it is critical that our empirical work provides accurate conclusions about underlying data patterns. False positives represent an especially important problem, as vast public and private resources can be misguided if we base decisions on false discovery. This study explores one especially pernicious influence on false positives—multiple hypothesis testing (MHT). While MHT potentially affects all types of empirical work, we consider three common scenarios where MHT influences inference within experimental economics: jointly identifying treatment effects for a set of outcomes, estimating heterogeneous treatment effects through subgroup analysis, and conducting hypothesis testing for multiple treatment conditions. Building upon the work of Romano and Wolf (2010), we present a correction procedure that incorporates the three scenarios, and illustrate the improvement in power by comparing our results with those obtained by the classic studies due to Bonferroni (1935) and Holm (1979). Importantly, under weak assumptions, our testing procedure asymptotically controls the familywise error rate – the probability of one false rejection – and is asymptotically balanced. We showcase our approach by revisiting the data reported in Karlan and List (2007), to deepen our understanding of why people give to charitable causes.

Keywords: multiple hypothesis testing; experimental economics; treatment effects; subgroup analysis; familywise error rate

JEL Codes: C1; C9; C91; C92; C93

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
Multiple Hypothesis Testing (MHT) (C12)	validity of empirical findings (C90)
Jointly Identifying Treatment Effects (C32)	risk of false positives (C52)
Heterogeneous Treatment Effects (C21)	likelihood of Type I errors (C12)
Multiple Treatment Conditions (C32)	inflate Type I error rate (E31)
Correction procedure (C20)	reliability of treatment effect estimates (C90)
Correction procedure (C20)	smaller p-values (C29)

Back to index