Working Paper: NBER ID: w26586
Authors: Ian Martin; Stefan Nagel
Abstract: Modern investors face a high-dimensional prediction problem: thousands of observable variables are potentially relevant for forecasting. We reassess the conventional wisdom on market efficiency in light of this fact. In our model economy, which resembles a typical machine learning setting, N assets have cash flows that are a linear function of J firm characteristics, but with uncertain coefficients. Risk-neutral Bayesian investors impose shrinkage (ridge regression) or sparsity (Lasso) when they estimate the J coefficients of the model and use them to price assets. When J is comparable in size to N, returns appear cross-sectionally predictable using firm characteristics to an econometrician who analyzes data from the economy ex post. A factor zoo emerges even without p-hacking and data-mining. Standard in-sample tests of market efficiency reject the no-predictability null with high probability, despite the fact that investors optimally use the information available to them in real time. In contrast, out-of-sample tests retain their economic meaning.
Keywords: Market Efficiency; Big Data; Machine Learning; Asset Pricing
JEL Codes: C11; G12; G14
Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.
Cause | Effect |
---|---|
high-dimensional setting (C55) | returns appear cross-sectionally predictable (C29) |
dimensionality of predictors (j) relative to the number of assets (n) (C20) | significant predictability (C53) |
traditional tests of market efficiency (G14) | reject the no-predictability null (C52) |
empirical findings of predictability (G41) | investors do not have precise knowledge of the cash flow prediction model parameters (G17) |
out-of-sample predictability does not exist (C53) | investors' forecasts are based on a Bayesian learning process (G17) |
increased dimensionality (C39) | perceived predictability under in-sample tests (C52) |