Depth with Nonlinearity Creates No Bad Local Minima in ResNets
This addresses optimization challenges in deep learning specifically for ResNets, though it's limited to this architecture and doesn't apply to other network types.
The authors proved that arbitrarily deep ResNets with arbitrary nonlinear activations have no bad local minima, with all local minima values being at least as good as the global minimum of corresponding classical models and guaranteed to improve further via residual representations. This provides an affirmative answer to an open question from NeurIPS 2018.
In this paper, we prove that depth with nonlinearity creates no bad local minima in a type of arbitrarily deep ResNets with arbitrary nonlinear activation functions, in the sense that the values of all local minima are no worse than the global minimum value of corresponding classical machine-learning models, and are guaranteed to further improve via residual representations. As a result, this paper provides an affirmative answer to an open question stated in a paper in the conference on Neural Information Processing Systems 2018. This paper advances the optimization theory of deep learning only for ResNets and not for other network architectures.