Emptying the Tank: Getting the Most Out of Limited Data

Working Paper: NBER ID: w24855

Authors: M. Scott Taylor

Abstract: All empirical researchers know that having more sources of variation in a dataset is valuable. What is not known is how valuable, and if the marginal value of adding another source of variation diminishes or increases. This note provides explicit answers to these questions. It defines "valuable" as the number of independent questions the data can potentially answer, and provides a surprisingly simple and useful rule that tells the researcher not only when they have "emptied the tank" of their data's valuable implications, but also the marginal value of further data collection. An illustration using home heating costs is provided.

Keywords: data variation; empirical research; marginal value; independent questions

JEL Codes: A20; Q40; Q41


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
sources of variation (C90)independent questions (C12)
additional sources of variation (C39)independent questions (C12)
m = 2^n - 1 (C30)independent questions (C12)
marginal value of adding another source of variation (C59)independent questions (C12)

Back to index