LGNov 22, 2016

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

arXiv:1611.07476v232.3278 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of understanding optimization dynamics in deep neural networks for researchers, but it appears incremental as it builds on existing studies of Hessian properties.

The paper investigates the eigenvalues of the Hessian in deep learning, finding that the distribution consists of a bulk near zero and edges away from zero, with empirical evidence linking the bulk to over-parametrization and the edges to input data.

We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data.

View on arXiv PDF

Similar