LGMLNov 16, 2018

The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size

arXiv:1811.07062v280 citations
Originality Synthesis-oriented
AI Analysis

This work provides incremental insights into the optimization landscape of deep learning models, relevant for researchers in machine learning optimization.

The authors applied high-dimensional numerical linear algebra to approximate the Hessian spectrum of large-scale deep neural networks, confirming previous findings of spiked behavior with outliers and analyzing the dynamics of Hessian components during training and with varying sample sizes.

We apply state-of-the-art tools in modern high-dimensional numerical linear algebra to approximate efficiently the spectrum of the Hessian of modern deepnets, with tens of millions of parameters, trained on real data. Our results corroborate previous findings, based on small-scale networks, that the Hessian exhibits "spiked" behavior, with several outliers isolated from a continuous bulk. We decompose the Hessian into different components and study the dynamics with training and sample size of each term individually.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes