LGDSOCMLMar 24, 2021

Why Do Local Methods Solve Nonconvex Problems?

arXiv:2103.13462v114 citations
Originality Incremental advance
AI Analysis

This addresses a foundational issue for machine learning practitioners who rely on non-convex optimization, though it appears incremental as it formalizes an existing hypothesis.

The paper tackles the problem of why local optimization methods like stochastic gradient descent often find approximate global minima in non-convex machine learning problems, which are NP-hard in theory, by hypothesizing and formalizing that most local minima in these objectives are approximately global.

Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes