LGOCMLDec 31, 2019

Avoiding Spurious Local Minima in Deep Quadratic Networks

arXiv:2001.00098v25 citations
Originality Incremental advance
AI Analysis

This provides theoretical insights into training dynamics for deep learning practitioners, though it is incremental as it focuses on a specific activation function.

The paper tackles the problem of spurious local minima in neural networks by analyzing the loss landscape for networks with quadratic activation functions, proving that such minima can be escaped with probability one under certain conditions, and empirically showing convergence to global minima.

Despite their practical success, a theoretical understanding of the loss landscape of neural networks has proven challenging due to the high-dimensional, non-convex, and highly nonlinear structure of such models. In this paper, we characterize the training landscape of the mean squared error loss for neural networks with quadratic activation functions. We prove existence of spurious local minima and saddle points which can be escaped easily with probability one when the number of neurons is greater than or equal to the input dimension and the norm of the training samples is used as a regressor. We prove that deep overparameterized neural networks with quadratic activations benefit from similar nice landscape properties. Our theoretical results are independent of data distribution and fill the existing gap in theory for two-layer quadratic neural networks. Finally, we empirically demonstrate convergence to a global minimum for these problems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes