MLFeb 19, 2017

Exponentially vanishing sub-optimal local minima in multilayer neural networks

arXiv:1702.05777v5102 citations
Originality Incremental advance
AI Analysis

It addresses the problem of optimization guarantees for neural networks, providing theoretical and empirical evidence that local minima are not a major obstacle, which is incremental but clarifies prior assumptions.

The paper proves that sub-optimal local minima are exponentially rare in multilayer neural networks with piecewise linear units and quadratic loss, given high-dimensional input and realistic hidden layer sizes, achieving 0% training error on CIFAR with about 16 hidden neurons.

Background: Statistical mechanics results (Dauphin et al. (2014); Choromanska et al. (2015)) suggest that local minima with high error are exponentially rare in high dimensions. However, to prove low error guarantees for Multilayer Neural Networks (MNNs), previous works so far required either a heavily modified MNN model or training method, strong assumptions on the labels (e.g., "near" linear separability), or an unrealistic hidden layer with $Ω\left(N\right)$ units. Results: We examine a MNN with one hidden layer of piecewise linear units, a single output, and a quadratic loss. We prove that, with high probability in the limit of $N\rightarrow\infty$ datapoints, the volume of differentiable regions of the empiric loss containing sub-optimal differentiable local minima is exponentially vanishing in comparison with the same volume of global minima, given standard normal input of dimension $d_{0}=\tildeΩ\left(\sqrt{N}\right)$, and a more realistic number of $d_{1}=\tildeΩ\left(N/d_{0}\right)$ hidden units. We demonstrate our results numerically: for example, $0\%$ binary classification training error on CIFAR with only $N/d_{0}\approx 16$ hidden neurons.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes