Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
This work addresses the challenge of improving generalization in deep learning for practitioners, though it is incremental as it builds on existing noise-based regularization methods.
The paper tackles the problem of overfitting in deep neural networks by interpreting noise injection regularization as optimizing a lower bound and proposing a technique using multiple noise samples per example to tighten this bound, demonstrating effectiveness in computer vision applications.
Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is known as a successful regularizer, but it is still not clear enough why such training techniques work well in practice and how we can maximize their benefit in the presence of two conflicting objectives---optimizing to true data distribution and preventing overfitting by regularization. This paper addresses the above issues by 1) interpreting that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and 2) proposing a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration. We demonstrate the effectiveness of our idea in several computer vision applications.