Market Efficiency in the Age of Big Data

Working Paper: CEPR ID: DP14235

Abstract: Modern investors face a high-dimensional prediction problem: thousands of observable variables are potentially relevant for forecasting. We reassess the conventional wisdom on market efficiency in light of this fact. In our model economy, which resembles a typical machine learning setting, N assets have cash flows that are a linear function of J firm characteristics, but with uncertain coefficients. Risk-neutral Bayesian investors impose shrinkage (ridge regression) or sparsity (Lasso) when they estimate the J coefficients of the model and use them to price assets. When J is comparable in size to N, returns appear cross-sectionally predictable using firm characteristics to an econometrician who analyzes data from the economy ex post. A factor zoo emerges even without p-hacking and data-mining. Standard in-sample tests of market efficiency reject the no-predictability null with high probability, despite the fact that investors optimally use the information available to them in real time. In contrast, out-of-sample tests retain their economic meaning.

Keywords: Market Efficiency; Big Data; Machine Learning

JEL Codes: G10; G12; G14; C11; C12; C58

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
high-dimensional setting where the number of predictors (j) is comparable to the number of assets (n) (C58)	asset returns appear cross-sectionally predictable based on firm characteristics (G12)
investors' forecasts (G17)	predictable returns (G17)
predictability observed in standard in-sample tests (C52)	estimation error (C51)
investors' optimal use of information (G11)	rejection of the no-predictability null hypothesis (C52)
high-dimensional asymptotics (C55)	overwhelming rejection of the no-predictability null hypothesis (C52)
learning problem faced by investors (G11)	in-sample predictability (C53)
in-sample predictability (C53)	complicating interpretations of standard market efficiency tests (G14)

Back to index