Free Energy Wells and Overlap Gap Property in Sparse PCA
This work addresses the computational hardness of sparse PCA for researchers in statistics and machine learning, providing theoretical lower bounds that are incremental to prior conjectures.
The authors tackled the sparse PCA problem in the hard regime by analyzing free energy wells and the Overlap Gap Property, showing that natural MCMC methods cannot solve it faster than a conjectured sub-exponential runtime, with results applying across various tuning parameters.
We study a variant of the sparse PCA (principal component analysis) problem in the "hard" regime, where the inference task is possible yet no polynomial-time algorithm is known to exist. Prior work, based on the low-degree likelihood ratio, has conjectured a precise expression for the best possible (sub-exponential) runtime throughout the hard regime. Following instead a statistical physics inspired point of view, we show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem. These free energy wells imply hitting time lower bounds that corroborate the low-degree conjecture: we show that a class of natural MCMC (Markov chain Monte Carlo) methods (with worst-case initialization) cannot solve sparse PCA with less than the conjectured runtime. These lower bounds apply to a wide range of values for two tuning parameters: temperature and sparsity misparametrization. Finally, we prove that the Overlap Gap Property (OGP), a structural property that implies failure of certain local search algorithms, holds in a significant part of the hard regime.