Are More Data Always Better for Factor Analysis?

Working Paper: NBER ID: w9829

Authors: Jean Boivin; Serena Ng

Abstract: Factors estimated from large macroeconomic panels are being used in an increasing number of applications. However, little is known about how the size and the composition of the data affect the factor estimates. In this paper, we question whether it is possible to use more series to extract the factors, and yet the resulting factors are less useful for forecasting, and the answer is yes. Such a problem tends to arise when the idiosyncratic errors are cross-correlated. It can also arise if forecasting power is provided by a factor that is dominant in a small dataset but is a dominated factor in a larger dataset. In a real time forecasting exercise, we find that factors extracted from as few as 40 pre-screened series often yield satisfactory or even better results than using all 147 series. Weighting the data by their properties when constructing the factors also lead to improved forecasts. Our simulation analysis is unique in that special attention is paid to cross-correlated idiosyncratic errors, and we also allow the factors to have stronger loadings on some groups of series than others. It thus allows us to better understand the properties of the principal components estimator in empirical applications.

Keywords: No keywords provided

JEL Codes: E37; E47; C3; C53


Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.


Causal Claims

CauseEffect
size of dataset (C55)forecasting power (C53)
cross-correlated idiosyncratic errors (C21)forecasting power (C53)
adding more data (Y10)less useful factors for forecasting (C53)
lower-ranked or noisy series (Y30)average size of common component (L63)
average size of common component (L63)forecasting power (C53)

Back to index