LGOCMLFeb 9, 2024

Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion

arXiv:2402.06756v19 citationsh-index: 15COLT
Originality Highly original
AI Analysis

This addresses matrix completion for applications like recommendation systems by providing a simpler, regularization-free method, though it is incremental as it builds on existing gradient descent analyses.

The paper tackles the problem of symmetric matrix completion using vanilla gradient descent with small initialization, proving it converges to the ground truth without explicit regularization, even when over-parameterizing the rank, with results showing near-linear convergence to an ε-neighborhood given sufficient observations.

We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite matrix $\rm{X}^\star \in \mathbb{R}^{d\times d}$ of rank-$r$, parameterized by $\rm{U}\rm{U}^{\top}$, from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization provably converges to the ground truth $\rm{X}^\star$ without requiring any explicit regularization. This convergence result holds true even in the over-parameterized scenario, where the true rank $r$ is unknown and conservatively over-estimated by a search rank $r'\gg r$. The existing results for this problem either require explicit regularization, a sufficiently accurate initial point, or exact knowledge of the true rank $r$. In the over-parameterized regime where $r'\geq r$, we show that, with $\widetildeΩ(dr^9)$ observations, GD with an initial point $\|\rm{U}_0\| \leq ε$ converges near-linearly to an $ε$-neighborhood of $\rm{X}^\star$. Consequently, smaller initial points result in increasingly accurate solutions. Surprisingly, neither the convergence rate nor the final accuracy depends on the over-parameterized search rank $r'$, and they are only governed by the true rank $r$. In the exactly-parameterized regime where $r'=r$, we further enhance this result by proving that GD converges at a faster rate to achieve an arbitrarily small accuracy $ε>0$, provided the initial point satisfies $\|\rm{U}_0\| = O(1/d)$. At the crux of our method lies a novel weakly-coupled leave-one-out analysis, which allows us to establish the global convergence of GD, extending beyond what was previously possible using the classical leave-one-out analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes