Generalization Error Bounds for Deep Neural Networks Trained by SGD
This provides theoretical guarantees for generalization in deep learning, addressing a fundamental problem for researchers and practitioners using SGD-based training.
The authors derived generalization error bounds for deep neural networks trained by SGD by combining dynamical parameter norm control with Rademacher complexity estimates, resulting in bounds that explicitly depend on training loss and apply to various architectures without requiring L-smoothness of the loss function. Numerical results showed these bounds are non-vacuous and robust across different optimizers and hyperparameters.
Generalization error bounds for deep neural networks trained by stochastic gradient descent (SGD) are derived by combining a dynamical control of an appropriate parameter norm and the Rademacher complexity estimate based on parameter norms. The bounds explicitly depend on the loss along the training trajectory, and work for a wide range of network architectures including multilayer perceptron (MLP) and convolutional neural networks (CNN). Compared with other algorithm-depending generalization estimates such as uniform stability-based bounds, our bounds do not require $L$-smoothness of the nonconvex loss function, and apply directly to SGD instead of Stochastic Langevin gradient descent (SGLD). Numerical results show that our bounds are non-vacuous and robust with the change of optimizer and network hyperparameters.