LG MLJan 2, 2019

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh

arXiv:1901.00451v120.787 citationsh-index: 63

Originality Incremental advance

AI Analysis

This provides theoretical insight into a fundamental problem in deep learning for researchers and practitioners, though it is incremental as it builds on existing observations.

The paper tackles the problem of understanding why stochastic gradient descent (SGD) can train deep neural networks to a global minimum by establishing convergence for nonconvex optimization problems, showing that SGD follows a star-convex path and converges deterministically.

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. In such a context, our analysis shows that SGD, although has long been considered as a randomized algorithm, converges in an intrinsically deterministic manner to a global minimum.

View on arXiv PDF

Similar