LGITMLJul 4, 2019

Learning One-hidden-layer neural networks via Provable Gradient Descent with Random Initialization

arXiv:1907.06594v2
Originality Highly original
AI Analysis

This work addresses the challenge of understanding the mathematical principles behind neural network training, providing theoretical guarantees for a specific non-convex optimization problem, though it is incremental as it focuses on a simplified network architecture.

The paper tackles the problem of learning a one-hidden-layer neural network with quadratic activations in the under-parameterized regime, proposing a provable gradient descent method with random initialization that converges to a globally optimal model at a linear rate with near-optimal sample complexity.

Although deep learning has shown its powerful performance in many applications, the mathematical principles behind neural networks are still mysterious. In this paper, we consider the problem of learning a one-hidden-layer neural network with quadratic activations. We focus on the under-parameterized regime where the number of hidden units is smaller than the dimension of the inputs. We shall propose to solve the problem via a provable gradient-based method with random initialization. For the non-convex neural networks training problem we reveal that the gradient descent iterates are able to enter a local region that enjoys strong convexity and smoothness within a few iterations, and then provably converges to a globally optimal model at a linear rate with near-optimal sample complexity. We further corroborate our theoretical findings via various experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes