LGMLJun 14, 2023

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

arXiv:2306.08590v28 citationsh-index: 96
Originality Incremental advance
AI Analysis

This challenges the prevailing understanding of SGD's role in online learning, offering insights for researchers in machine learning optimization.

The paper investigates whether SGD noise provides implicit bias advantages in online learning, finding that small batch sizes offer no such benefits and only computational advantages compared to offline learning.

The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by finite batch sizes ("SGD noise"). While prior works focused on offline learning (i.e., multiple-epoch training), we study the impact of SGD noise on online (i.e., single epoch) learning. Through an extensive empirical analysis of image and language data, we demonstrate that small batch sizes do not confer any implicit bias advantages in online learning. In contrast to offline learning, the benefits of SGD noise in online learning are strictly computational, facilitating more cost-effective gradient steps. This suggests that SGD in the online regime can be construed as taking noisy steps along the "golden path" of the noiseless gradient descent algorithm. We study this hypothesis and provide supporting evidence in loss and function space. Our findings challenge the prevailing understanding of SGD and offer novel insights into its role in online learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes