OCLGMLJun 27, 2021

Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization

arXiv:2106.14289v158 citations
Originality Highly original
AI Analysis

This provides a theoretical foundation for gradient descent in a canonical optimization problem, impacting areas like matrix sensing and completion, though it is incremental in advancing existing empirical observations.

The paper tackles the non-convex and non-smooth asymmetric low-rank matrix factorization problem, proving for the first time that randomly initialized gradient descent converges to a global minimum with a polynomial rate, without requiring artificial modifications like noise or regularizers.

We study the asymmetric low-rank factorization problem: \[\min_{\mathbf{U} \in \mathbb{R}^{m \times d}, \mathbf{V} \in \mathbb{R}^{n \times d}} \frac{1}{2}\|\mathbf{U}\mathbf{V}^\top -\mathbfΣ\|_F^2\] where $\mathbfΣ$ is a given matrix of size $m \times n$ and rank $d$. This is a canonical problem that admits two difficulties in optimization: 1) non-convexity and 2) non-smoothness (due to unbalancedness of $\mathbf{U}$ and $\mathbf{V}$). This is also a prototype for more complex problems such as asymmetric matrix sensing and matrix completion. Despite being non-convex and non-smooth, it has been observed empirically that the randomly initialized gradient descent algorithm can solve this problem in polynomial time. Existing theories to explain this phenomenon all require artificial modifications of the algorithm, such as adding noise in each iteration and adding a balancing regularizer to balance the $\mathbf{U}$ and $\mathbf{V}$. This paper presents the first proof that shows randomly initialized gradient descent converges to a global minimum of the asymmetric low-rank factorization problem with a polynomial rate. For the proof, we develop 1) a new symmetrization technique to capture the magnitudes of the symmetry and asymmetry, and 2) a quantitative perturbation analysis to approximate matrix derivatives. We believe both are useful for other related non-convex problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes