LGMLJan 2, 2019

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

arXiv:1901.00451v187 citations
Originality Incremental advance
AI Analysis

This provides theoretical insight into a fundamental problem in deep learning for researchers and practitioners, though it is incremental as it builds on existing observations.

The paper tackles the problem of understanding why stochastic gradient descent (SGD) can train deep neural networks to a global minimum by establishing convergence for nonconvex optimization problems, showing that SGD follows a star-convex path and converges deterministically.

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. In such a context, our analysis shows that SGD, although has long been considered as a randomized algorithm, converges in an intrinsically deterministic manner to a global minimum.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes