Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data

Working Paper: NBER ID: w23673

Abstract: This paper seeks to better understand what makes big data analysis different, what we can and cannot do with existing econometric tools, and what issues need to be dealt with in order to work with the data efficiently. As a case study, I set out to extract any business cycle information that might exist in four terabytes of weekly scanner data. The main challenge is to handle the volume, variety, and characteristics of the data within the constraints of our computing environment. Scalable and efficient algorithms are available to ease the computation burden, but they often have unknown statistical properties and are not designed for the purpose of efficient estimation or optimal inference. As well, economic data have unique characteristics that generic algorithms may not accommodate. There is a need for computationally efficient econometric methods as big data is likely here to stay.

Keywords: big data; econometrics; data analysis; business cycles

JEL Codes: C55; C81

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
random subsampling algorithms (C34)	efficiency of data processing (C89)
seasonal adjustments at the individual level (J22)	removal of seasonal variations at the aggregate level (C43)
cyclical components extracted from the data (E39)	influenced by how seasonal effects are modeled (C22)

Back to index