A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization With Non-isolated Local Minima
This work addresses a gap between theory and practice for machine learning problems involving loss functions with non-isolated minima, providing theoretical insights for practitioners.
The paper tackles the problem of analyzing stochastic gradient descent (SGD) for non-convex optimization with non-isolated local minima, by formulating a new local convexity condition and showing it generalizes existing conditions, and proves local convergence under this condition using stochastic stability, with concentration inequalities that help interpret empirical training results.
Loss functions with non-isolated minima have emerged in several machine learning problems, creating a gap between theory and practice. In this paper, we formulate a new type of local convexity condition that is suitable to describe the behavior of loss functions near non-isolated minima. We show that such condition is general enough to encompass many existing conditions. In addition we study the local convergence of the SGD under this mild condition by adopting the notion of stochastic stability. The corresponding concentration inequalities from the convergence analysis help to interpret the empirical observation from some practical training results.