LG NEAug 28, 2015

Parallel Dither and Dropout for Regularising Deep Neural Networks

arXiv:1508.07130v16.16 citations

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in training deep neural networks for researchers and practitioners by enabling effective regularization without batch averaging, though it appears incremental as it builds on existing dither and dropout concepts.

The authors tackled the problem of regularizing deep neural networks without batch averaging, showing that existing methods like dither and dropout fail in this setting, and introduced a new parallel regularization method that achieves substantially better results than batch-SGD.

Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be used without batch averaging. Our results for parallel-regularised non-batch-SGD are substantially better than what is possible with batch-SGD. Furthermore, our results demonstrate that dither and dropout are complimentary.

View on arXiv PDF

Similar