CLAILGApr 20, 2021

RoFormer: Enhanced Transformer with Rotary Position Embedding

arXiv:2104.09864v55399 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of effectively modeling positional dependencies in transformers for tasks like long text classification, representing an incremental advancement in position encoding methods.

The paper tackles the problem of integrating positional information into transformer-based language models by proposing Rotary Position Embedding (RoPE), which encodes absolute positions with rotation matrices and incorporates relative position dependencies, leading to consistent improvements over alternatives on long text classification benchmarks.

Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. Notably, RoPE enables valuable properties, including the flexibility of sequence length, decaying inter-token dependency with increasing relative distances, and the capability of equipping the linear self-attention with relative position encoding. Finally, we evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets. Our experiments show that it consistently overcomes its alternatives. Furthermore, we provide a theoretical analysis to explain some experimental results. RoFormer is already integrated into Huggingface: \url{https://huggingface.co/docs/transformers/model_doc/roformer}.

Code Implementations20 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes