LGMLDec 16, 2018

Non-attracting Regions of Local Minima in Deep and Wide Neural Networks

arXiv:1812.06486v414 citations
Originality Incremental advance
AI Analysis

This work addresses a foundational issue in neural network theory for researchers and practitioners, offering insights into why deep networks avoid poor local minima, though it is incremental as it builds on prior theoretical studies.

The authors tackled the problem of understanding suboptimal local minima in deep neural networks by constructing examples of such minima in fully connected networks with sigmoid activations, showing they exist but can be escaped via non-increasing paths, and proving this holds for extremely wide networks with decreasing width, providing a partial explanation for their success.

Understanding the loss surface of neural networks is essential for the design of models with predictable performance and their success in applications. Experimental results suggest that sufficiently deep and wide neural networks are not negatively impacted by suboptimal local minima. Despite recent progress, the reason for this outcome is not fully understood. Could deep networks have very few, if at all, suboptimal local optima? or could all of them be equally good? We provide a construction to show that suboptimal local minima (i.e., non-global ones), even though degenerate, exist for fully connected neural networks with sigmoid activation functions. The local minima obtained by our construction belong to a connected set of local solutions that can be escaped from via a non-increasing path on the loss curve. For extremely wide neural networks of decreasing width after the wide layer, we prove that every suboptimal local minimum belongs to such a connected set. This provides a partial explanation for the successful application of deep neural networks. In addition, we also characterize under what conditions the same construction leads to saddle points instead of local minima for deep neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes