MLITLGOCOct 23, 2017

Stability and Generalization of Learning Algorithms that Converge to Global Optima

arXiv:1710.08402v1191 citations
Originality Incremental advance
AI Analysis

This work addresses the stability and generalization of learning algorithms for researchers in machine learning theory, providing incremental improvements to existing bounds.

The paper tackles the problem of establishing generalization bounds for learning algorithms that converge to global minima by deriving black-box stability results based on convergence and loss function geometry, showing these results for nonconvex loss functions like Polyak-Łojasiewicz and quadratic growth conditions, and applying them to algorithms such as SGD, GD, RCD, and SVRG, with results matching or improving state-of-the-art bounds and revealing cases where SGD is stable but GD is not.

We establish novel generalization bounds for learning algorithms that converge to global minima. We do so by deriving black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function. The results are shown for nonconvex loss functions satisfying the Polyak-Łojasiewicz (PL) and the quadratic growth (QG) conditions. We further show that these conditions arise for some neural networks with linear activations. We use our black-box results to establish the stability of optimization algorithms such as stochastic gradient descent (SGD), gradient descent (GD), randomized coordinate descent (RCD), and the stochastic variance reduced gradient method (SVRG), in both the PL and the strongly convex setting. Our results match or improve state-of-the-art generalization bounds and can easily be extended to similar optimization algorithms. Finally, we show that although our results imply comparable stability for SGD and GD in the PL setting, there exist simple neural networks with multiple local minima where SGD is stable but GD is not.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes