Working Paper: NBER ID: w24334
Authors: Patrick Bajari; Victor Chernozhukov; Ali Hortasu; Junichi Suzuki
Abstract: In academic and policy circles, there has been considerable interest in the impact of “big data” on firm performance. We examine the question of how the amount of data impacts the accuracy of Machine Learned models of weekly retail product forecasts using a proprietary data set obtained from Amazon. We examine the accuracy of forecasts in two relevant dimensions: the number of products (N), and the number of time periods for which a product is available for sale (T). Theory suggests diminishing returns to larger N and T, with relative forecast errors diminishing at rate 1/√N+1/√T. Empirical results indicate gains in forecast improvement in the T dimension; as more and more data is available for a particular product, demand forecasts for that product improve over time, though with diminishing returns to scale. In contrast, we find an essentially flat N effect across the various lines of merchandise: with a few exceptions, expansion in the number of retail products within a category does not appear associated with increases in forecast performance. We do find that the firm’s overall forecast performance, controlling for N and T effects across product lines, has improved over time, suggesting gradual improvements in forecasting from the introduction of new models and improved technology.
Keywords: big data; firm performance; forecast accuracy; machine learning
JEL Codes: C53; L81
Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.
Cause | Effect |
---|---|
increased data availability (t) (Y10) | improved forecasting accuracy (C53) |
technological improvements (O33) | improved forecasting accuracy (C53) |
increased number of retail products (n) (L81) | forecast performance (G17) |
data accumulation (C80) | engineering challenges (O36) |