LGMay 19, 2024

Approximation and Gradient Descent Training with Neural Networks

arXiv:2405.11696v12 citationsh-index: 3Sampling Theory, Signal Processing, and Data Analysis
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap for researchers in machine learning by providing rigorous bounds for gradient descent training, but it is incremental as it builds on prior results for gradient flow.

The paper tackles the incompatibility between neural network approximation theory and training in over-parametrized regimes by extending a neural tangent kernel argument to under-parametrized regimes, establishing approximation bounds for networks trained by gradient descent, which is a practical method compared to the idealized gradient flow.

It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training error, these two theories are not immediately compatible. Recent work uses the smoothness that is required for approximation results to extend a neural tangent kernel (NTK) optimization argument to an under-parametrized regime and show direct approximation bounds for networks trained by gradient flow. Since gradient flow is only an idealization of a practical method, this paper establishes analogous results for networks trained by gradient descent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes