OCLGMLSep 30, 2019

On the convergence of gradient descent for two layer neural networks

arXiv:1909.13671v3
Originality Synthesis-oriented
AI Analysis

This work provides theoretical guarantees for gradient descent in neural networks, which is incremental as it builds on existing ideas to address convergence and approximation issues in over-parametrized regimes.

The paper tackles the problem of training two-layer neural networks with gradient descent to approximate target continuous functions, showing that it achieves an exponential convergence rate with a network width independent of training data size, implying strong approximation ability without curse of dimensionality.

It has been shown that gradient descent can yield the zero training loss in the over-parametrized regime (the width of the neural networks is much larger than the number of data points). In this work, combining the ideas of some existing works, we investigate the gradient descent method for training two-layer neural networks for approximating some target continuous functions. By making use the generic chaining technique from probability theory, we show that gradient descent can yield an exponential convergence rate, while the width of the neural networks needed is independent of the size of the training data. The result also implies some strong approximation ability of the two-layer neural networks without curse of dimensionality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes