CLMar 6, 2018

Self-Attention with Relative Position Representations

Peter Shaw, Jakob Uszkoreit, Ashish Vaswani

arXiv:1803.02155v240.42879 citationsHas Code

Originality Highly original

AI Analysis

This addresses the need for efficient position modeling in sequence-to-sequence tasks like machine translation, offering a novel method that improves performance without being incremental.

The paper tackled the problem of modeling position information in Transformers by extending self-attention to incorporate relative position representations, resulting in improvements of 1.3 BLEU on English-to-German and 0.3 BLEU on English-to-French translation tasks over absolute position representations.

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.

View on arXiv PDF Code

Similar