Should We Trust Clustered Standard Errors? A Comparison with Randomization-Based Methods

Working Paper: NBER ID: w25926

Authors: Loureno S. Paz; James E. West

Abstract: We compare the precision of critical values obtained under conventional sampling-based methods with those obtained using sample order statics computed through draws from a randomized counterfactual based on the null hypothesis. When based on a small number of draws (200), critical values in the extreme left and right tail (0.005 and 0.995) contain a small bias toward failing to reject the null hypothesis which quickly dissipates with additional draws. The precision of randomization-based critical values compares favorably with conventional sampling-based critical values when the number of draws is approximately 7 times the sample size for a basic OLS model using homoskedastic data, but considerably less in models based on clustered standard errors, or the classic Differences-in-Differences. Randomization-based methods dramatically outperform conventional methods for treatment effects in Differences-in-Differences specifications with unbalanced panels and a small number of treated groups.

Keywords: clustered standard errors; randomization-based methods; statistical inference

JEL Codes: C18; C33

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
randomization-based critical values exhibit less bias (C46)	greater precision (C13)
number of draws is approximately seven times the sample size (C83)	randomization-based critical values exhibit less bias (C46)
small sample sizes or few treated groups (C90)	conventional methods tend to overreject the null hypothesis (C12)
small number of draws (200) (C46)	critical values exhibit a slight underrejection of the null hypothesis (C46)
increase in number of draws (H27)	critical values correct underrejection of the null hypothesis (C52)
randomization-based methods outperform conventional methods (C90)	treatment effects in difference-in-differences specifications (C22)
low number of clusters (C38)	randomization-based methods outperform conventional methods (C90)
randomization-based methods provide more accurate type I error rates (C90)	scenarios with unbalanced panels (C23)
randomization-based methods yield more reliable inferences (C90)	various statistical frameworks (C11)
randomization-based methods may have lower statistical power (C90)	conventional methods (C90)
randomization-based methods do not exhibit the same degree of overrejection of the null hypothesis (C90)	more robust under specific conditions (C59)

Back to index