The Gender Pay Gap Revisited with Big Data: Do Methodological Choices Matter?

Working Paper: CEPR ID: DP15840

Authors: Anthony Strittmatter; Conny Wunsch

Abstract: The vast majority of existing studies that estimate the average unexplained gender pay gap use unnecessarily restrictive linear versions of the Blinder-Oaxaca decomposition. Using a notably rich and large data set of 1.7 million employees in Switzerland, we investigate how the methodological improvements made possible by such big data affect estimates of the unexplained gender pay gap. We study the sensitivity of the estimates with regard to i) the availability of observationally comparable men and women, ii) model flexibility when controlling for wage determinants, and iii) the choice of different parametric and semi-parametric estimators, including variants that make use of machine learning methods. We find that these three factors matter greatly. Blinder-Oaxaca estimates of the unexplained gender pay gap decline by up to 39% when we enforce comparability between men and women and use a more flexible specification of the wage equation. Semi-parametric matching yields estimates that when compared with the Blinder-Oaxaca estimates, are up to 50% smaller and also less sensitive to the way wage determinants are included.

Keywords: gender inequality; gender pay gap; common support; model specification; matching estimator; machine learning

JEL Codes: J31; C21

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
Methodological choices (C90)	Estimated unexplained gender pay gap (J79)
Enforcing comparability between men and women (J78)	Estimated unexplained gender pay gap (J79)
Using more flexible specifications of the wage equation (J39)	Estimated unexplained gender pay gap (J79)
Semiparametric matching (C14)	Estimated unexplained gender pay gap (J79)
Lack of comparable men for women (J79)	Raw gender pay gap (J79)

Back to index