Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems
This provides a theoretical guarantee for SGD in non-convex optimization, which is incremental but important for applications like matrix completion and subspace tracking.
The paper tackles the problem of global convergence for stochastic gradient descent (SGD) on non-convex matrix problems by proposing a step size scheme, proving that it converges globally from a random starting point within O(ε⁻¹ n log n) steps with constant probability for constant-rank problems.
Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation. In this paper, we exhibit a step size scheme for SGD on a low-rank least-squares problem, and we prove that, under broad sampling conditions, our method converges globally from a random starting point within $O(ε^{-1} n \log n)$ steps with constant probability for constant-rank problems. Our modification of SGD relates it to stochastic power iteration. We also show experiments to illustrate the runtime and convergence of the algorithm.