LGMLNov 13, 2015

On the Quality of the Initial Basin in Overspecified Neural Networks

arXiv:1511.04210v3130 citations
Originality Incremental advance
AI Analysis

This work addresses a foundational theoretical problem in machine learning by providing insights into optimization challenges for researchers and practitioners, though it is incremental as it builds on existing observations.

The paper tackles the problem of understanding why deep neural networks can be successfully trained despite the non-convex optimization landscape, by analyzing the geometric structure of the objective function for ReLU networks with random initialization. It finds that overspecified networks are more likely to initialize in favorable basins with monotonically decreasing paths to global minima, aligning with empirical and theoretical observations.

Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open problem, since training neural networks involves optimizing a highly non-convex objective function, and is known to be computationally hard in the worst case. In this work, we study the \emph{geometric} structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters. We identify some conditions under which it becomes more favorable to optimization, in the sense of (i) High probability of initializing at a point from which there is a monotonically decreasing path to a global minimum; and (ii) High probability of initializing at a basin (suitably defined) with a small minimal objective value. A common theme in our results is that such properties are more likely to hold for larger ("overspecified") networks, which accords with some recent empirical and theoretical observations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes