LGNADec 16, 2019

PyHessian: Neural Networks Through the Lens of the Hessian

arXiv:1912.07145v3385 citationsHas Code
Originality Incremental advance
AI Analysis

This work provides a tool for researchers to analyze neural network behavior, offering insights into model trainability, though it is incremental in refining existing claims about loss landscape smoothness.

The authors tackled the problem of efficiently computing Hessian information for deep neural networks, presenting PyHessian, a scalable framework that enables fast computation of top eigenvalues, trace, and spectral density, and used it to analyze the effects of residual connections and Batch Normalization on loss landscape smoothness, finding that Batch Normalization does not necessarily smooth the landscape, especially in shallower networks.

We present PYHESSIAN, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. PYHESSIAN enables fast computations of the top Hessian eigenvalues, the Hessian trace, and the full Hessian eigenvalue/spectral density, and it supports distributed-memory execution on cloud/supercomputer systems and is available as open source. This general framework can be used to analyze neural network models, including the topology of the loss landscape (i.e., curvature information) to gain insight into the behavior of different models/optimizers. To illustrate this, we analyze the effect of residual connections and Batch Normalization layers on the trainability of neural networks. One recent claim, based on simpler first-order analysis, is that residual connections and Batch Normalization make the loss landscape smoother, thus making it easier for Stochastic Gradient Descent to converge to a good solution. Our extensive analysis shows new finer-scale insights, demonstrating that, while conventional wisdom is sometimes validated, in other cases it is simply incorrect. In particular, we find that Batch Normalization does not necessarily make the loss landscape smoother, especially for shallower networks.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes