Comparing Predictive Accuracy Twenty Years Later: A Personal Perspective on the Use and Abuse of Diebold-Mariano Tests

Working Paper: NBER ID: w18391

Abstract: The Diebold-Mariano (DM) test was intended for comparing forecasts; it has been, and remains, useful in that regard. The DM test was not intended for comparing models. Unfortunately, however, much of the large subsequent literature uses DM-type tests for comparing models, in (pseudo-) out-of-sample environments. In that case, much simpler yet more compelling full-sample model comparison procedures exist; they have been, and should continue to be, widely used. The hunch that (pseudo-) out-of-sample analysis is somehow the "only," or "best," or even a "good" way to provide insurance against in-sample over-fitting in model comparisons proves largely false. On the other hand, (pseudo-) out-of-sample analysis may be useful for learning about comparative historical predictive performance.

Keywords: No keywords provided

JEL Codes: C01; C52; C53

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
lower expected loss (G33)	better forecasting performance (C53)
pseudo out-of-sample analyses (C52)	misleading researchers regarding model effectiveness (C52)
full-sample model comparison procedures (C52)	more appropriate (Y20)
pseudo out-of-sample methods (C51)	lack robustness compared to full-sample analyses (C20)

Back to index