Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace
This work addresses the challenge of overfitting in deep learning for practitioners, though it is incremental as it builds on prior regularization techniques.
The paper tackles the problem of improving generalization in deep neural networks by introducing a regularization method that penalizes the Hessian trace, motivated by generalization error bounds, and shows it outperforms existing methods like Jacobian and Mixup in experiments.
In this paper, we develop a novel regularization method for deep neural networks by penalizing the trace of Hessian. This regularizer is motivated by a recent guarantee bound of the generalization error. We explain its benefits in finding flat minima and avoiding Lyapunov stability in dynamical systems. We adopt the Hutchinson method as a classical unbiased estimator for the trace of a matrix and further accelerate its calculation using a dropout scheme. Experiments demonstrate that our method outperforms existing regularizers and data augmentation methods, such as Jacobian, Confidence Penalty, Label Smoothing, Cutout, and Mixup.