CL LG NEJul 25, 2017

Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

Fréderic Godin, Jonas Degrave, Joni Dambre, Wesley De Neve

arXiv:1707.08214v22.16 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of training deeper recurrent architectures for natural language processing tasks, though it is incremental as it builds on existing QRNN frameworks.

The authors tackled the problem of vanishing gradients and limited depth in Quasi-Recurrent Neural Networks (QRNNs) by introducing Dual Rectified Linear Units (DReLUs) as a replacement for tanh activations, achieving improved performance in sentiment classification and character-level language modeling, including stacking up to eight layers to surpass LSTM-based state-of-the-art results.

In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a drop-in replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) (Bradbury et al. (2017)). Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. (2017) and compare our DReLU-based QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to improve the current state-of-the-art in character-level language modeling over shallow architectures based on LSTMs.

View on arXiv PDF Code

Similar