On the loss landscape of a class of deep neural networks with no bad local valleys
This addresses the optimization challenge of local minima in deep learning for researchers and practitioners, though it is incremental as it focuses on a specific network class.
The authors proved that a class of over-parameterized deep neural networks with standard activation functions and cross-entropy loss has no bad local valleys, meaning from any parameter point, a continuous path exists where loss is non-increasing and approaches zero, implying no sub-optimal strict local minima.
We identify a class of over-parameterized deep neural networks with standard activation functions and cross-entropy loss which provably have no bad local valley, in the sense that from any point in parameter space there exists a continuous path on which the cross-entropy loss is non-increasing and gets arbitrarily close to zero. This implies that these networks have no sub-optimal strict local minima.