LGMLMay 30, 2021

Embedding Principle of Loss Landscape of Deep Neural Networks

arXiv:2105.14573v344 citations
Originality Highly original
AI Analysis

This work provides a foundational framework for analyzing loss landscapes in deep learning, which could impact optimization and generalization theories across the field.

The authors tackled the problem of understanding the loss landscape of deep neural networks by proving an embedding principle that shows critical points of narrower networks are contained within wider networks, with empirical evidence that wide networks are attracted to these embedded points, explaining their easy optimization and implicit regularization.

Understanding the structure of loss landscape of deep neural networks (DNNs)is obviously important. In this work, we prove an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs. More precisely, we propose a critical embedding such that any critical point, e.g., local or global minima, of a narrower DNN can be embedded to a critical point/hyperplane of the target DNN with higher degeneracy and preserving the DNN output function. The embedding structure of critical points is independent of loss function and training data, showing a stark difference from other nonconvex problems such as protein-folding. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. The embedding principle provides an explanation for the general easy optimization of wide DNNs and unravels a potential implicit low-complexity regularization during the training. Overall, our work provides a skeleton for the study of loss landscape of DNNs and its implication, by which a more exact and comprehensive understanding can be anticipated in the near

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes