Phase Transitions in Sparse PCA
This work addresses fundamental computational limits in high-dimensional statistics, with implications for machine learning and data analysis, though it is incremental as it builds on prior theoretical results.
The paper investigates optimal estimation in sparse principal component analysis (PCA) under conditions where non-zero elements scale with data dimension, revealing phase transitions that indicate regions where estimation is theoretically possible but computationally infeasible for polynomial algorithms like approximate message passing (AMP).
We study optimal estimation for sparse principal component analysis when the number of non-zero elements is small but on the same order as the dimension of the data. We employ approximate message passing (AMP) algorithm and its state evolution to analyze what is the information theoretically minimal mean-squared error and the one achieved by AMP in the limit of large sizes. For a special case of rank one and large enough density of non-zeros Deshpande and Montanari [1] proved that AMP is asymptotically optimal. We show that both for low density and for large rank the problem undergoes a series of phase transitions suggesting existence of a region of parameters where estimation is information theoretically possible, but AMP (and presumably every other polynomial algorithm) fails. The analysis of the large rank limit is particularly instructive.