Fast Global Convergence for Low-rank Matrix Recovery via Riemannian Gradient Descent with Random Initialization
This work provides a theoretical understanding of the global convergence properties of Riemannian gradient descent for low-rank matrix recovery, which is important for researchers and practitioners using these optimization methods.
This paper proposes a new global analysis framework for low-rank matrix recovery problems using Riemannian gradient descent on a least squares loss function. The authors show that with random initialization, the algorithm avoids spurious critical points and converges to the ground truth in nearly linear time, specifically in O(log(1/ε) + log(n)) iterations for an ε-accurate solution.
In this paper, we propose a new global analysis framework for a class of low-rank matrix recovery problems on the Riemannian manifold. We analyze the global behavior for the Riemannian optimization with random initialization. We use the Riemannian gradient descent algorithm to minimize a least squares loss function, and study the asymptotic behavior as well as the exact convergence rate. We reveal a previously unknown geometric property of the low-rank matrix manifold, which is the existence of spurious critical points for the simple least squares function on the manifold. We show that under some assumptions, the Riemannian gradient descent starting from a random initialization with high probability avoids these spurious critical points and only converges to the ground truth in nearly linear convergence rate, i.e. $\mathcal{O}(\text{log}(\frac{1}ε)+ \text{log}(n))$ iterations to reach an $ε$-accurate solution. We use two applications as examples for our global analysis. The first one is a rank-1 matrix recovery problem. The second one is a generalization of the Gaussian phase retrieval problem. It only satisfies the weak isometry property, but has behavior similar to that of the first one except for an extra saddle set. Our convergence guarantee is nearly optimal and almost dimension-free, which fully explains the numerical observations. The global analysis can be potentially extended to other data problems with random measurement structures and empirical least squares loss functions.