Local Linear Convergence of Gradient Methods for Subspace Optimization via Strict Complementarity
This addresses the challenge of fast convergence to global minimizers in subspace optimization for tasks like PCA variants, offering a method that is both efficient and theoretically sound, though incremental in combining existing approaches.
The paper tackles the problem of efficiently finding optimal subspaces for convex smooth losses, bridging nonconvex gradient methods and convex relaxations under a strict complementarity assumption. It proves that a natural nonconvex gradient method converges locally with a linear rate, requiring only a single QR-factorization per iteration.
We consider optimization problems in which the goal is find a $k$-dimensional subspace of $\mathbb{R}^n$, $k<<n$, which minimizes a convex and smooth loss. Such problems generalize the fundamental task of principal component analysis (PCA) to include robust and sparse counterparts, and logistic PCA for binary data, among others. This problem could be approached either via nonconvex gradient methods with highly-efficient iterations, but for which arguing about fast convergence to a global minimizer is difficult or, via a convex relaxation for which arguing about convergence to a global minimizer is straightforward, but the corresponding methods are often inefficient in high dimensions. In this work we bridge these two approaches under a strict complementarity assumption, which in particular implies that the optimal solution to the convex relaxation is unique and is also the optimal solution to the original nonconvex problem. Our main result is a proof that a natural nonconvex gradient method which is \textit{SVD-free} and requires only a single QR-factorization of an $n\times k$ matrix per iteration, converges locally with a linear rate. We also establish linear convergence results for the nonconvex projected gradient method, and the Frank-Wolfe method when applied to the convex relaxation.