ST LG MLOct 27, 2020

On Model Identification and Out-of-Sample Prediction of Principal Component Regression: Applications to Synthetic Controls

Anish Agarwal, Devavrat Shah, Dennis Shen

arXiv:2010.14449v58.08 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a gap in high-dimensional statistics and synthetic controls for policy evaluation, offering incremental theoretical advances with practical testing tools.

The paper tackles the problem of model identification and out-of-sample prediction in principal component regression under high-dimensional error-in-variables with fixed design, showing that it consistently identifies a unique model and establishes non-asymptotic prediction guarantees that improve upon best known rates.

We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show that PCR consistently identifies the unique model with minimum $\ell_2$-norm. These results enable us to establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates. In the course of our analysis, we introduce a natural linear algebraic condition between the in- and out-of-sample covariates, which allows us to avoid distributional assumptions for out-of-sample predictions. Our simulations illustrate the importance of this condition for generalization, even under covariate shifts. Accordingly, we construct a hypothesis test to check when this conditions holds in practice. As a byproduct, our results also lead to novel results for the synthetic controls literature, a leading approach for policy evaluation. To the best of our knowledge, our prediction guarantees for the fixed design setting have been elusive in both the high-dimensional error-in-variables and synthetic controls literatures.

View on arXiv PDF Code

Similar