Human Biographical Record (HBR)

Working Paper: CEPR ID: DP15825

Authors: Arash Nekoei; Fabian Sinn

Abstract: We construct a new dataset of more than seven million notable individuals across recorded human history, the Human Biographical Record (HBR). With Wikidata as the backbone, HBR adds further information from various digital sources, including Wikipedia in all 292 languages. Machine learning and text analysis combine the sources and extract information on date and place of birth and death, gender, occupation, education, and family background. This paper discusses HBR's construction and its completeness, coverage, accuracy, and also its strength and weakness relative to prior datasets. HBR is the first part of a larger project, the human record project that we briefly introduce.

Keywords: big data; machine learning; economic history

JEL Codes: No JEL codes provided


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
crowdsourcing and machine learning (O36)dataset's robustness (C55)
integration of multiple data sources (Y10)extraction of accurate biographical information (Y60)
extraction of accurate biographical information (Y60)dataset's reliability (Y10)
modern methods of data collection (C80)improved data quality (L15)
methodology employed in constructing HBR (C90)reliable outcomes (C90)

Back to index