LGMLFeb 24, 2021

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

arXiv:2102.12430v119 citations
Originality Highly original
AI Analysis

This provides theoretical insight into noise-induced implicit bias in optimization, which is incremental but clarifies a fundamental issue in machine learning.

The paper tackles the problem of understanding why noise helps in nonconvex optimization by studying matrix factorization, showing that noisy gradient descent converges to a specific global optimum determined by the noise, unlike standard gradient descent which can converge to any optimum.

Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gradient descent (GD) can converge to any optimum, depending on the initialization. In contrast, we show that a perturbed form of GD with an arbitrary initialization converges to a global optimum that is uniquely determined by the injected noise. Our result implies that the noise imposes implicit bias towards certain optima. Numerical experiments are provided to support our theory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes