LG OC MLMay 30, 2019

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

arXiv:1905.13210v336.9432 citations

Originality Incremental advance

AI Analysis

This provides theoretical guarantees for generalization in deep learning, addressing a fundamental challenge for researchers and practitioners, though it builds incrementally on existing neural tangent kernel work.

The paper tackles the problem of generalization in over-parameterized deep neural networks by showing that the expected 0-1 loss for wide ReLU networks trained with SGD can be bounded by the training loss of a neural tangent random feature model, yielding a generalization error bound of order tilde{O}(n^{-1/2}) independent of network width for certain data distributions.

We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n^{-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.

View on arXiv PDF

Similar