MLITLGNov 6, 2019

Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

arXiv:1911.02151v3171 citations
Originality Incremental advance
AI Analysis

This work provides incremental improvements in theoretical machine learning for researchers studying generalization in noisy iterative algorithms.

The paper tackles the problem of deriving tighter generalization bounds for Stochastic Gradient Langevin Dynamics (SGLD) by improving mutual information bounds using data-dependent estimates, resulting in terms that are orders of magnitude smaller than previous bounds based on gradient norms.

In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019). Our main contributions are significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates. Our approach is based on the variational characterization of mutual information and the use of data-dependent priors that forecast the mini-batch gradient based on a subset of the training samples. Our approach is broadly applicable within the information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky (2017). Our bound can be tied to a measure of flatness of the empirical risk surface. As compared with other bounds that depend on the squared norms of gradients, empirical investigations show that the terms in our bounds are orders of magnitude smaller.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes