LGMLJun 15, 2020

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

arXiv:2006.08680v2119 citations
AI Analysis

This work addresses a theoretical gap for researchers in optimization and machine learning, providing insights into the implicit bias of SGD noise, though it is incremental as it builds on prior models.

The paper tackles the problem of understanding why parameter-dependent noise in SGD leads to better implicit regularization than spherical Gaussian noise in overparameterized models, showing that label noise recovers sparse ground-truth solutions while Gaussian noise leads to overfitting to dense solutions.

The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization effect for training overparameterized models. Prior theoretical work largely focuses on spherical Gaussian noise, whereas empirical studies demonstrate the phenomenon that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise. This paper theoretically characterizes this phenomenon on a quadratically-parameterized model introduced by Vaskevicius et el. and Woodworth et el. We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms. Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not. Code for our project is publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes