Predicting Returns with Text Data

Working Paper: NBER ID: w26186

Authors: Zheng Tracy Ke; Bryan T. Kelly; Dacheng Xiu

Abstract: We introduce a new text-mining methodology that extracts sentiment information from news articles to predict asset returns. Unlike more common sentiment scores used for stock return prediction (e.g., those sold by commercial vendors or built with dictionary-based methods), our supervised learning framework constructs a sentiment score that is specifically adapted to the problem of return prediction. Our method proceeds in three steps: 1) isolating a list of sentiment terms via predictive screening, 2) assigning sentiment weights to these words via topic modeling, and 3) aggregating terms into an article-level sentiment score via penalized likelihood. We derive theoretical guarantees on the accuracy of estimates from our model with minimal assumptions. In our empirical analysis, we text-mine one of the most actively monitored streams of news articles in the financial system|the Dow Jones Newswires|and show that our supervised sentiment model excels at extracting return-predictive signals in this context.

Keywords: text mining; sentiment analysis; asset returns; financial markets

JEL Codes: C53; C55; C58; G10; G11; G12; G14; G17; G4


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
Sentiment scores derived from news content (G12)Asset price movements (G19)
Positive sentiment (E32)Increased asset returns (G19)
Negative sentiment (G41)Decreased asset returns (G19)
Fresh news (Y60)Larger impact on asset prices (G19)
Stale news (Y60)Fully reflected in prices within two days (G14)
Fresh news (Y60)Takes four days for complete assimilation into prices (D41)
Stock attributes (size and volatility) (C46)Causal relationship between news sentiment and price adjustments (G14)

Back to index