LGDIS-NNFeb 23, 2023

Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width

arXiv:2302.12250v219 citationsh-index: 33
Originality Incremental advance
AI Analysis

This work provides insights into early training dynamics for researchers in deep learning optimization, though it is incremental as it builds on existing studies of sharpness and stability.

The authors systematically analyzed optimization dynamics in deep neural networks trained with SGD, identifying four distinct regimes based on the learning rate, depth, and width, and discovered a 'sharpness reduction' phase where sharpness decreases early in training as depth increases and width decreases.

We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) and study the effect of learning rate $η$, depth $d$, and width $w$ of the neural network. By analyzing the maximum eigenvalue $λ^H_t$ of the Hessian of the loss, which is a measure of sharpness of the loss landscape, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and (iv) a late time ``edge of stability" regime. The early and intermediate regimes (i) and (ii) exhibit a rich phase diagram depending on $η\equiv c / λ_0^H $, $d$, and $w$. We identify several critical values of $c$, which separate qualitatively distinct phenomena in the early time dynamics of training loss and sharpness. Notably, we discover the opening up of a ``sharpness reduction" phase, where sharpness decreases at early times, as $d$ and $1/w$ are increased.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes