Incentive-Compatible Critical Values

Working Paper: CEPR ID: DP16942

Authors: Pascal Michaillat; Adam McCloskey

Abstract: Statistically significant results are more rewarded than insignificant ones, so researchers have the incentive to pursue statistical significance. Such p-hacking reduces the informativeness of hypothesis tests by making significant results much more common than they are supposed to be in the absence of true significance. To address this problem, we construct critical values of test statistics such that, if these values are used to determine significance, and if researchers optimally respond to these new significance standards, then significant results occur with the desired frequency. Such incentive-compatible critical values allow for p-hacking so they are larger than classical critical values. Using evidence from the social and medical sciences, we find that the incentive-compatible critical value for any test and any significance level is the classical critical value for the same test with approximately one fifth of the significance level—a form of Bonferroni correction. For instance, for a z-test with a significance level of 5%, the incentive-compatible critical value is 2.31 instead of 1.65 if the test is one-sided and 2.57 instead of 1.96 if the test is two-sided.

Keywords: hypothesis testing; academic incentives; phacking; statistical significance; optimal stopping

JEL Codes: C12; C18

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
Introduction of ICCVs (F53)	Reduction of type 1 errors (C52)
Phacking (K24)	Increased frequency of statistically significant results (C46)
Statistically significant results are disproportionately rewarded (C46)	Phacking (K24)
ICCVs (F53)	Achieve significant results at desired frequency (C22)
ICCVs derived from Bonferroni correction (C46)	Adjustment of classical critical values (C46)

Back to index