ML LG NEOct 5, 2015

Batch Normalized Recurrent Neural Networks

César Laurent, Gabriel Pereyra, Philémon Brakel, Ying Zhang, Yoshua Bengio

arXiv:1510.01378v125.5221 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of inefficient RNN training for researchers and practitioners, but it is incremental as it shows limited benefits compared to feedforward networks.

The paper tackled the challenge of applying batch normalization to recurrent neural networks (RNNs) to improve training efficiency, finding that it does not help with hidden-to-hidden transitions and only speeds up training convergence for input-to-hidden transitions without improving generalization in language modeling and speech recognition tasks.

Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies. However, they are computationally expensive to train and difficult to parallelize. Recent work has shown that normalizing intermediate representations of neural networks can significantly improve convergence rates in feedforward neural networks . In particular, batch normalization, which uses mini-batch statistics to standardize features, was shown to significantly reduce training time. In this paper, we show that applying batch normalization to the hidden-to-hidden transitions of our RNNs doesn't help the training procedure. We also show that when applied to the input-to-hidden transitions, batch normalization can lead to a faster convergence of the training criterion but doesn't seem to improve the generalization performance on both our language modelling and speech recognition tasks. All in all, applying batch normalization to RNNs turns out to be more challenging than applying it to feedforward networks, but certain variants of it can still be beneficial.

View on arXiv PDF

Similar