Man versus Machine: Self-Reports versus Algorithmic Measurement of Publications

Working Paper: NBER ID: w28431

Authors: Xuan Jiang; Wanying Chang; Bruce A. Weinberg

Abstract: This paper uses newly available data from Web of Science on publications matched to researchers in Survey of Doctorate Recipients to compare scientific publications collected by surveys and algorithmic approaches. We aim to illustrate the different types of measurement errors in self-reported and machine-generated data by estimating how publication measures from the two approaches are related to career outcomes (e.g. salaries, placements, and faculty rankings). We find that the potential biases in the self-reports are smaller relative to the algorithmic data. Moreover, the errors in the two approaches are quite intuitive: the measurement errors of the algorithmic data are mainly due to the accuracy of matching, which primarily depends on the frequency of names and the data that was available to make matches; while the noise in self reports is expected to increase over the career as researchers’ publication records become more complex, harder to recall, and less immediately relevant for career progress. This paper provides methodological suggestion on evaluating the quality and advantages of two approaches to data construction. It also provides guidance on how to use the new linked data.

Keywords: publications; self-reports; algorithmic measurement; career outcomes

JEL Codes: C26; J24; J3; O31

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
self-reported publication counts (SDR) (C46)	accuracy of publication records (C80)
algorithmic data (WOS) (C69)	accuracy of publication records (C80)
career progression (J62)	self-reported accuracy (C52)
academic age (I23)	self-reported accuracy (C52)
uncommon names (Y30)	accuracy of algorithmic measures (C52)

Back to index