Privacy and Data-Based Research

Working Paper: NBER ID: w19433

Authors: Ori Heffetz; Katrina Ligett

Abstract: What can we, as users of microdata, formally guarantee to the individuals (or firms) in our dataset, regarding their privacy? We retell a few stories, well-known in data-privacy circles, of failed anonymization attempts in publicly released datasets. We then provide a mostly informal introduction to several ideas from the literature on differential privacy, an active literature in computer science that studies formal approaches to preserving the privacy of individuals in statistical databases. We apply some of its insights to situations routinely faced by applied economists, emphasizing big-data contexts.

Keywords: differential privacy; data privacy; big data; anonymization

JEL Codes: C49; C89; D89; Z00

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
release of anonymized data (C81)	reidentification of individuals (R20)
failure to adequately anonymize data (C81)	severe privacy violations (K24)
assumption of anonymity based on deidentification practices (Z13)	linkage attacks when auxiliary information is available (Y50)

Back to index