MLLGAug 21, 2021

Shift-Curvature, SGD, and Generalization

arXiv:2108.09507v35 citations
Originality Incremental advance
AI Analysis

This provides a nuanced theoretical understanding of generalization in deep learning, addressing a longstanding debate but is incremental in refining existing hypotheses.

The paper tackles the problem of how curvature in loss landscapes affects generalization and SGD's role, showing that curvature harms test performance through new mechanisms like shift-curvature, and deriving an SGD steady-state distribution that reveals a trade-off between deep and low-curvature regions. Experiments confirm shift-curvature's impact on test loss and explore SGD noise-curvature relationships.

A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and that SGD discourages curvature. We offer a more complete and nuanced view in support of both. First, we show that curvature harms test performance through two new mechanisms, the shift-curvature and bias-curvature, in addition to a known parameter-covariance mechanism. The three curvature-mediated contributions to test performance are reparametrization-invariant although curvature is not. The shift in the shift-curvature is the line connecting train and test local minima, which differ due to dataset sampling or distribution shift. Although the shift is unknown at training time, the shift-curvature can still be mitigated by minimizing overall curvature. Second, we derive a new, explicit SGD steady-state distribution showing that SGD optimizes an effective potential related to but different from train loss, and that SGD noise mediates a trade-off between deep versus low-curvature regions of this effective potential. Third, combining our test performance analysis with the SGD steady state shows that for small SGD noise, the shift-curvature may be the most significant of the three mechanisms. Our experiments confirm the impact of shift-curvature on test loss, and further explore the relationship between SGD noise and curvature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes