LGOCMar 31, 2021

Empirically explaining SGD from a line search perspective

arXiv:2103.17132v34 citations
Originality Synthesis-oriented
AI Analysis

This provides incremental insights into optimization behavior for deep learning researchers, but does not offer broad practical improvements.

The paper tackles the limited understanding of SGD in deep learning by empirically analyzing its trajectory from a line search perspective, finding that the full-batch loss is highly parabolic along update directions and that a specific learning rate enables SGD to perform almost exact line searches.

Optimization in Deep Learning is mainly guided by vague intuitions and strong assumptions, with a limited understanding how and why these work in practice. To shed more light on this, our work provides some deeper understandings of how SGD behaves by empirically analyzing the trajectory taken by SGD from a line search perspective. Specifically, a costly quantitative analysis of the full-batch loss along SGD trajectories from common used models trained on a subset of CIFAR-10 is performed. Our core results include that the full-batch loss along lines in update step direction is highly parabolically. Further on, we show that there exists a learning rate with which SGD always performs almost exact line searches on the full-batch loss. Finally, we provide a different perspective why increasing the batch size has almost the same effect as decreasing the learning rate by the same factor.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes