LGCLCVASJul 12, 2019

R-Transformer: Recurrent Neural Network Enhanced Transformer

arXiv:1907.05572v1116 citationsHas Code
Originality Highly original
AI Analysis

This work solves the problem of improving sequence modeling for tasks requiring both local and global dependencies, offering a novel hybrid approach that is not incremental but integrates key components from RNNs and Transformers.

The paper tackles the problem of sequence modeling by addressing the limitations of existing models like RNNs and Transformers, which struggle with capturing local structures and long-term dependencies, and proposes R-Transformer to combine their advantages without position embeddings, achieving state-of-the-art performance with large margins in most tasks.

Recurrent Neural Networks have long been the dominating choice for sequence modeling. However, it severely suffers from two issues: impotent in capturing very long-term dependencies and unable to parallelize the sequential computation procedure. Therefore, many non-recurrent sequence models that are built on convolution and attention operations have been proposed recently. Notably, models with multi-head attention such as Transformer have demonstrated extreme effectiveness in capturing long-term dependencies in a variety of sequence modeling tasks. Despite their success, however, these models lack necessary components to model local structures in sequences and heavily rely on position embeddings that have limited effects and require a considerable amount of design efforts. In this paper, we propose the R-Transformer which enjoys the advantages of both RNNs and the multi-head attention mechanism while avoids their respective drawbacks. The proposed model can effectively capture both local structures and global long-term dependencies in sequences without any use of position embeddings. We evaluate R-Transformer through extensive experiments with data from a wide range of domains and the empirical results show that R-Transformer outperforms the state-of-the-art methods by a large margin in most of the tasks. We have made the code publicly available at \url{https://github.com/DSE-MSU/R-transformer}.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes