Sparse Modeling Under Grouped Heterogeneity with an Application to Asset Pricing

Working Paper: NBER ID: w31424

Authors: Lin William Cong; Guanhao Feng; Jingyu He; Junye Li

Abstract: Sparse models, though long preferred and pursued by social scientists, can be ineffective or unstable relative to large models, for example, in economic predictions (Giannone et al., 2021). To achieve sparsity for economic interpretation while exploiting big data for superior empirical performance, we introduce a general framework that jointly clusters observations (via new decision trees) and locally selects variables (with Bayesian priors) for modeling panel data with potential grouped heterogeneity. We derive analytical marginal likelihoods as global split criteria in our Bayesian Clustering Model (BCM), to incorporate economic guidance, address parameter and model uncertainties, and prevent overfitting. We apply BCM to asset pricing and estimate uncommon-factor models for data-driven asset clusters and macroeconomic regimes. We find (i) cross-sectional heterogeneity linked to (non-linear interactions of) return volatility, size, and value, (ii) structural changes in factor relevance predicted by market volatility and valuation, and (iii) MKTRF and SMB as common factors and multiple uncommon factors across characteristics-managed-market-timed clusters. BCM helps explain volatility- or size-related anomalies, exploit within-group tests, and mitigate the “factor zoo” problem. Overall, BCM outperforms benchmark common-factor models in pricing and investments in U.S. equities, e.g., attaining out-of-sample cross-sectional R2s exceeding 25% for multiple clusters and Sharpe ratio of tangency portfolios tripling built from ME-B/M 5 × 5 portfolios.

Keywords: sparse models; asset pricing; Bayesian methods; grouped heterogeneity; factor models

JEL Codes: C11; C38; G11; G12


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
Bayesian Clustering Model (BCM) (C11)cross-sectional heterogeneity (C21)
market volatility (G17)relevance of different factors (C52)
valuation (D46)relevance of different factors (C52)
Bayesian Clustering Model (BCM) (C11)understanding and predictions in asset pricing (G19)
Bayesian Clustering Model (BCM) (C11)out-of-sample cross-sectional R² (C52)
Bayesian Clustering Model (BCM) (C11)Sharpe ratio of tangency portfolios (G19)

Back to index