LGMLOct 9, 2019

Nearly Minimal Over-Parametrization of Shallow Neural Networks

arXiv:1910.03948v23 citations
Originality Incremental advance
AI Analysis

This reduces computational and memory costs for training shallow networks, though it is incremental as it builds on existing over-parametrization theory.

The paper tackles the problem of over-parametrization needed for shallow neural networks to perfectly fit training data, showing that linear over-parametrization is sufficient using a simple gradient descent variant, compared to previous quadratic requirements.

A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario, the existing theory requires quadratic over-parametrization as a function of the number of training samples. This paper establishes that linear overparametrization is sufficient to fit the training data, using a simple variant of the (stochastic) gradient descent. Crucially, unlike several related works, the training considered in this paper is not limited to the lazy regime in the sense cautioned against in [1, 2]. Beyond shallow networks, the framework developed in this work for over-parametrization is applicable to a variety of learning problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes