LGMLDec 5, 2017

Deep linear neural networks with arbitrary loss: All local minima are global

arXiv:1712.01473v2147 citations
AI Analysis

This addresses a fundamental theoretical problem in deep learning by clarifying optimization landscapes for linear networks, though it is incremental as it builds on prior work on convexity and local minima.

The paper proves that for deep linear neural networks with arbitrary convex differentiable loss functions, all local minima are global minima under certain width conditions, and shows this result is the strongest possible by demonstrating sub-optimal local minima can exist if the loss is not differentiable.

We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes