LGOCMLFeb 26, 2017

Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs

arXiv:1702.07966v1316 citations
Originality Highly original
AI Analysis

This provides a theoretical guarantee for training convolutional neural networks, addressing a foundational issue in machine learning optimization, though it is specific to a simplified architecture and input distribution.

The paper tackles the problem of proving when gradient descent can succeed in non-convex optimization for deep learning, showing that for a one-hidden-layer convolutional neural net with ReLU activations and Gaussian inputs, gradient descent converges to the global optimum in polynomial time, while it is NP-complete in general cases.

Deep learning models are often successfully trained using gradient descent, despite the worst case hardness of the underlying non-convex optimization problem. The key question is then under what conditions can one prove that optimization will succeed. Here we provide a strong result of this kind. We consider a neural net with one hidden layer and a convolutional structure with no overlap and a ReLU activation function. For this architecture we show that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. To the best of our knowledge, this is the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes