LGDSOCMLNov 1, 2017

Learning One-hidden-layer Neural Networks with Landscape Design

arXiv:1711.00501v232.5267 citations
Originality Highly original
AI Analysis

This provides a theoretical guarantee for efficient learning of specific neural networks, addressing a fundamental optimization challenge in machine learning.

The paper tackles the problem of learning a one-hidden-layer neural network with nonnegative weights and Gaussian input, by designing a non-convex objective function with guaranteed benign landscape properties, enabling stochastic gradient descent to provably converge to the global minimum and recover ground-truth parameters with proven sample complexity.

We consider the problem of learning a one-hidden-layer neural network: we assume the input $x\in \mathbb{R}^d$ is from Gaussian distribution and the label $y = a^\top σ(Bx) + ξ$, where $a$ is a nonnegative vector in $\mathbb{R}^m$ with $m\le d$, $B\in \mathbb{R}^{m\times d}$ is a full-rank weight matrix, and $ξ$ is a noise vector. We first give an analytic formula for the population risk of the standard squared loss and demonstrate that it implicitly attempts to decompose a sequence of low-rank tensors simultaneously. Inspired by the formula, we design a non-convex objective function $G(\cdot)$ whose landscape is guaranteed to have the following properties: 1. All local minima of $G$ are also global minima. 2. All global minima of $G$ correspond to the ground truth parameters. 3. The value and gradient of $G$ can be estimated using samples. With these properties, stochastic gradient descent on $G$ provably converges to the global minimum and learn the ground-truth parameters. We also prove finite sample complexity result and validate the results by simulations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes