LGJul 26, 2022

Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability

arXiv:2207.12678v261 citationsh-index: 30
Originality Incremental advance
AI Analysis

It addresses optimization behavior in deep learning for researchers, but is incremental as it builds on prior findings like the Edge of Stability.

This paper analyzes gradient descent dynamics in neural networks, focusing on the sharpness (maximum Hessian eigenvalue) and its progression through phases like progressive sharpening and the Edge of Stability, with theoretical proofs for two-layer linear networks.

Recent findings (e.g., arXiv:2103.00065) demonstrate that modern neural networks trained by full-batch gradient descent typically enter a regime called Edge of Stability (EOS). In this regime, the sharpness, i.e., the maximum Hessian eigenvalue, first increases to the value 2/(step size) (the progressive sharpening phase) and then oscillates around this value (the EOS phase). This paper aims to analyze the GD dynamics and the sharpness along the optimization trajectory. Our analysis naturally divides the GD trajectory into four phases depending on the change of the sharpness. We empirically identify the norm of output layer weight as an interesting indicator of sharpness dynamics. Based on this empirical observation, we attempt to theoretically and empirically explain the dynamics of various key quantities that lead to the change of sharpness in each phase of EOS. Moreover, based on certain assumptions, we provide a theoretical proof of the sharpness behavior in EOS regime in two-layer fully-connected linear neural networks. We also discuss some other empirical findings and the limitation of our theoretical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes