LGOCMLDec 28, 2018

On the Benefit of Width for Neural Networks: Disappearance of Bad Basins

arXiv:1812.11039v742 citations
Originality Incremental advance
AI Analysis

This addresses the optimization landscape problem for neural network training, showing that width eliminates bad basins, which is incremental but provides rigorous theoretical insights.

The paper proves that wide neural networks have no sub-optimal basins in their loss surface, while narrow networks below a width threshold can have strict local minima that are not global, demonstrating a phase transition from narrow to wide networks.

Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove? To understand the benefit of width, it is important to identify the difference between wide and narrow networks. In this work, we prove that from narrow to wide networks, there is a phase transition from having sub-optimal basins to no sub-optimal basins. Specifically, we prove two results: on the positive side, for any continuous activation functions, the loss surface of a class of wide networks has no sub-optimal basins, where "basin" is defined as the set-wise strict local minimum; on the negative side, for a large class of networks with width below a threshold, we construct strict local minima that are not global. These two results together show the phase transition from narrow to wide networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes