MLLGOCFeb 19, 2018

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

arXiv:1802.06903v327 citations
Originality Incremental advance
AI Analysis

This work addresses the generalization properties of SGD for researchers in machine learning optimization, providing theoretical guarantees that are incremental improvements over existing stability-based analyses.

The paper tackles the problem of establishing generalization error bounds with probabilistic guarantees for stochastic gradient descent (SGD) in nonconvex optimization, resulting in improved bounds for both general nonconvex and gradient dominant loss functions, and high-probability guarantees for proximal SGD with strongly convex regularizers.

The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied nonconvex loss functions, but only considered the generalization error of the SGD in expectation. In this paper, we establish various generalization error bounds with probabilistic guarantee for the SGD. Specifically, for both general nonconvex loss functions and gradient dominant loss functions, we characterize the on-average stability of the iterates generated by SGD in terms of the on-average variance of the stochastic gradients. Such characterization leads to improved bounds for the generalization error for SGD. We then study the regularized risk minimization problem with strongly convex regularizers, and obtain improved generalization error bounds for proximal SGD. With strongly convex regularizers, we further establish the generalization error bounds for nonconvex loss functions under proximal SGD with high-probability guarantee, i.e., exponential concentration in probability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes