Text Algorithms in Economics

Working Paper: CEPR ID: DP18125

Authors: Elliott Ash; Stephen Hansen

Abstract: This paper provides an overview of the methods used for algorithmic text analysis in economics, with a focus on three key contributions. First, the paper introduces methods for representing documents as high-dimensional count vectors over vocabulary terms, for representing words as vectors, and for representing word sequences as embedding vectors. Second, the paper defines four core empirical tasks that encompass most text-as-data research in economics, and enumerates the various approaches that have been taken so far for these tasks. Finally, the paper flags limitations in the current literature, with a focus on the challenge of validating algorithmic output.

Keywords: text as data; topic models; word embeddings; transformer models

JEL Codes: C18; C45; C55


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
algorithmic text analysis (C63)understanding of economic concepts (A13)
algorithmic text analysis (C63)quantitative measures derived from text data (C89)
different algorithms (C45)document similarity (C59)
document similarity (C59)downstream econometric analysis (C51)
choice of algorithm (C52)estimation of relationships between textual similarity and firm covariates (C51)

Back to index