ML LG OCDec 4, 2020

When does gradient descent with logistic loss find interpolating two-layer networks?

Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

arXiv:2012.02409v414.317 citations

Originality Incremental advance

AI Analysis

This work addresses a theoretical problem for researchers studying the convergence properties of neural network training, specifically focusing on the conditions for interpolation with logistic loss.

This paper investigates the conditions under which gradient descent with logistic loss successfully trains finite-width two-layer smoothed ReLU networks for binary classification, showing that it drives the training loss to zero if the initial loss is sufficiently small. They further demonstrate that under specific data cluster and separation conditions, and with a sufficiently wide network, a single step of gradient descent can reduce the loss enough to meet this initial condition.

We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss. We show that gradient descent drives the training loss to zero if the initial loss is small enough. When the data satisfies certain cluster and separation conditions and the network is wide enough, we show that one step of gradient descent reduces the loss sufficiently that the first result applies.

View on arXiv PDF

Similar