LGMLJun 11, 2019

Large Scale Structure of Neural Network Loss Landscapes

arXiv:1906.04724v1104 citations
Originality Incremental advance
AI Analysis

This work provides insights into optimization properties for deep learning researchers, though it is incremental in building on existing landscape studies.

The authors tackled the problem of understanding the large-scale structure of neural network loss landscapes by proposing a unified phenomenological model based on high-dimensional wedges, and they experimentally verified the existence of low-loss subspaces connecting multiple solutions.

There are many surprising and perhaps counter-intuitive properties of optimization of deep neural networks. We propose and experimentally verify a unified phenomenological model of the loss landscape that incorporates many of them. High dimensionality plays a key role in our model. Our core idea is to model the loss landscape as a set of high dimensional \emph{wedges} that together form a large-scale, inter-connected structure and towards which optimization is drawn. We first show that hyperparameter choices such as learning rate, network width and $L_2$ regularization, affect the path optimizer takes through the landscape in a similar ways, influencing the large scale curvature of the regions the optimizer explores. Finally, we predict and demonstrate new counter-intuitive properties of the loss-landscape. We show an existence of low loss subspaces connecting a set (not only a pair) of solutions, and verify it experimentally. Finally, we analyze recently popular ensembling techniques for deep networks in the light of our model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes