Box Office Buzz: Does Social Media Data Steal the Show from Model Uncertainty When Forecasting for Hollywood?

Working Paper: NBER ID: w22959

Authors: Steven Lehrer; Tian Xie

Abstract: Substantial excitement currently exists in industry regarding the potential of using analytic tools to measure sentiment in social media messages to help predict individual reactions to a new product, including movies. However, the majority of models subsequently used for forecasting exercises do not allow for model uncertainty. Using data on the universe of Twitter messages, we use an algorithm that calculates the sentiment regarding each film prior to, and after its release date via emotional valence to understand whether these opinions affect box office opening and retail movie unit (DVD and Blu-Ray) sales. Our results contrasting eleven different empirical strategies from econometrics and penalization methods indicate that accounting for model uncertainty can lead to large gains in forecast accuracy. While penalization methods do not outperform model averaging on forecast accuracy, evidence indicates they perform just as well at the variable selection stage. Last, incorporating social media data is shown to greatly improve forecast accuracy for box-office opening and retail movie unit sales.

Keywords: social media; forecasting; model uncertainty; box office; movie sales

JEL Codes: C52; C53; M21


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
positive sentiment on social media (Z13)increased box office revenues (H27)
positive sentiment on social media (Z13)increased retail unit sales (L81)
model averaging (C52)enhanced forecast accuracy (C53)
model averaging after lasso method (C52)better predictions than OLS alone (C51)
model uncertainty (D80)improved forecast accuracy (C53)
confounding variables (film characteristics, release timing) (C32)forecast accuracy (C53)

Back to index