LGMLSep 24, 2020

How Many Factors Influence Minima in SGD?

arXiv:2009.11858v13 citations
Originality Synthesis-oriented
AI Analysis

It clarifies theoretical understanding for researchers studying SGD dynamics in deep learning, but is incremental as it synthesizes existing literature.

This paper reviews factors influencing SGD minima, confirming the four-factor relationship from Wang (2019) and noting limitations of the three-factor model from Jastrzȩbski et al. (2018).

Stochastic gradient descent (SGD) is often applied to train Deep Neural Networks (DNNs), and research efforts have been devoted to investigate the convergent dynamics of SGD and minima found by SGD. The influencing factors identified in the literature include learning rate, batch size, Hessian, and gradient covariance, and stochastic differential equations are used to model SGD and establish the relationships among these factors for characterizing minima found by SGD. It has been found that the ratio of batch size to learning rate is a main factor in highlighting the underlying SGD dynamics; however, the influence of other important factors such as the Hessian and gradient covariance is not entirely agreed upon. This paper describes the factors and relationships in the recent literature and presents numerical findings on the relationships. In particular, it confirms the four-factor and general relationship results obtained in Wang (2019), while the three-factor and associated relationship results found in Jastrzȩbski et al. (2018) may not hold beyond the considered special case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes