Working Paper: NBER ID: w25270
Authors: Chihsheng Hsieh; Stanley I. M. Ko; Jaromír Kovárník; Trevon Logan
Abstract: This paper analyzes statistical issues arising from networks based on non-representative samples of the population. We first characterize the biases in both network statistics and estimates of network effects under non-random sampling analytically and numerically. Sampled network data systematically bias the properties of population networks and suffer from non-classical measurement-error problems when applied as regressors. Apart from the sampling rate and the elicitation procedure, these biases depend in a nontrivial way on which subpopulations are missing with higher probability. We propose a methodology, adapting post-stratification weighting approaches to networked contexts, which enables researchers to recover several network-level statistics and reduce the biases in the estimated network effects. The advantages of the proposed methodology are that it can be applied to network data collected via both designed and non-designed sampling procedures, does not require one to assume any network formation model, and is straightforward to implement. We apply our approach to two widely used network data sets and show that accounting for the non-representativeness of the sample dramatically changes the results of regression analysis.
Keywords: Networks; Nonrandom Sampling; Biases; Poststratification; Econometrics
JEL Codes: C4; D85; L14; Z13
Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.
Cause | Effect |
---|---|
Nonrandom sampling (C83) | biased estimates of network effects (D85) |
poststratification weighting approach (C83) | recover true population network features (D85) |
not accounting for nonrandomness (C83) | affects estimated network effects (D85) |
biases (D91) | false positives in regression analyses (C20) |
existing approaches based on missing-at-random assumption (C34) | fail to eliminate biases (D91) |