RoPE (Long-context / context-window extension): heavily superseded — a standard baseline that newer methods routinely beat. 17 paper(s) critique it, 8 beat it on benchmarks — #1 of 53 most-superseded. Sub-problem: cluster led by RoPE. Newer alternatives in the same sub-problem include Mask Prior Suppression and Monotonic RoPE Scaling, CRoPE, C^2RoPE, Imaginary Extension of Rotary Position Embeddings, Selective RoPE.

Method Drift›Long-context / context-window extension

Heavily superseded#1 of 53 most-superseded

RoPE

RoFormer: Enhanced Transformer with Rotary Position Embedding

Long-context / context-window extension · first seen Apr 20, 2021

heavily superseded — a standard baseline that newer methods routinely beat

17 papers critique it · 8 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites RoPE as a baseline.

“RoPE-based language models have poor length generalization.”
— DAPE: Data-Adaptive Positional Encoding for Length Extrapolation
“However, RoPE can only operates on global angles, rendering relative angles implicit and inaccessible. Thus, RoPE struggles with periodic angular relations essential in trajectory prediction since it fails to address modular transformations”
— DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling
“Although effective, RoPE still relies on predefined static frequency patterns that are uniform across different inputs and attention heads. As a result, it remains position-dependent but not token- or context-dependent, limiting its expressiveness in modeling more nuanced sequence structures.”
— Context-aware Rotary Position Embedding
“the inherent flaw of rotary position embedding (RoPE) being used”
— Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective
“However, RoPE's 1D design, effective for text, overlooks the spatiotemporal structure of video data, limiting its suitability for Video-LLMs.”
— VRoPE: Rotary Position Embedding for Video Large Language Models
“However, RoPE exhibits oscillatory attention patterns due to its trigonometric periodicity, which can destabilize long-distance dependency modeling barbero2024roundroundgomakes.”
— HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models
“positional attention collapse, induced by the inherent locality bias of RoPE”
— Mitigating Mask Prior Drift and Positional Attention Collapse in Large Diffusion Vision-Language Models
“However, barbero2024round later provided a mathematical analysis showing that this claim is flawed: attention weights under RoPE do not necessarily decay proportionally with relative query-key distances.”
— Context-aware Biases for Length Extrapolation
“it does not decouple content and position semantically”
— Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
“RoPE suffers from long-term decay, as shown in Figure~fig:correlation(c), implying that as the relative distance increases, the relative upper bound on token correlations at modeled relative positions will continuously decrease.”
— 3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding
“We hypothesize that, for long distance attention, the way that RoPE rotates the query and the key vectors may prevent the model from utilizing the dimensions that it rotates significantly.”
— The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval
“The essential components (i.e., the RoPE matrices) of previous RoPE approaches rely on 2D rotation groups, which simplify computations but consequently restrict their feature projection capabilities, especially in high-dimensional spaces”
— ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices

Beaten on benchmarks

Head-to-head results where a newer method reports beating RoPE. Values are copied from the source paper's tables — verify against the cited paper.

DAPE-Kerple beats RoPE · perplexity (mean) [training_length_512_eval_8192]
3.8642 vs 265.4545
DAPE: Data-Adaptive Positional Encoding for Length Extrapolation
DAPE-Kerple beats RoPE · perplexity (mean) [training_length_512_eval_2048]
4.0505 vs 134.1615
DAPE: Data-Adaptive Positional Encoding for Length Extrapolation
CARoPE beats RoPE · Perplexity [GPT-Small models]
21.23 vs 21.31
Context-aware Rotary Position Embedding
CARoPE beats RoPE · Perplexity [GPT-Tiny models]
28.99 vs 29.33
Context-aware Rotary Position Embedding
VRoPE beats RoPE · Avg. [Video-Vicuna-7B]
44.48 vs 43.35
VRoPE: Rotary Position Embedding for Video Large Language Models
VRoPE beats RoPE · Avg. [Video-Qwen2-1.5B]
49.96 vs 48.90
VRoPE: Rotary Position Embedding for Video Large Language Models
VRoPE beats RoPE · Avg. [Video-Qwen2-7B]
56.35 vs 54.92
VRoPE: Rotary Position Embedding for Video Large Language Models
VRoPE beats RoPE · Accuracy [256-512]
98.28 vs 94.84
VRoPE: Rotary Position Embedding for Video Large Language Models
VRoPE beats RoPE · Accuracy [512-768]
95.16 vs 87.03
VRoPE: Rotary Position Embedding for Video Large Language Models
VRoPE beats RoPE · Accuracy [768-1024]
90.31 vs 73.28
VRoPE: Rotary Position Embedding for Video Large Language Models
VRoPE beats RoPE · Accuracy [1024-1216]
87.03 vs 54.84
VRoPE: Rotary Position Embedding for Video Large Language Models
HoPE beats RoPE · perplexity [2048 sequence length]
16.46 vs 25.80
HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.