LGMay 22, 2023

Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

arXiv:2305.13471v25 citations
Originality Incremental advance
AI Analysis

This work addresses convergence and generalization issues in neural network training for machine learning practitioners, offering incremental improvements by extending known linear model results to two-layer networks.

The paper tackles the problem of slow convergence in training two-layer neural networks with separable data by applying normalized gradient descent to exponentially-tailed loss functions, proving a linear convergence rate to the global optimum and establishing finite-time generalization bounds that prevent overfitting.

Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data. In this paper, we go beyond linear models by studying normalized GD on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum if the iterates find an interpolating model. This is made possible by showing certain gradient self-boundedness conditions and a log-Lipschitzness property. We also study generalization of normalized GD for convex objectives via an algorithmic-stability analysis. In particular, we show that normalized GD does not overfit during training by establishing finite-time generalization bounds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes