3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding
This work addresses the challenge of long-context modeling in NLP, offering a novel encoding method that improves position resolution and decay control, though it appears incremental as an advanced version of RoPE.
The paper tackles the problem of modeling long contexts in language models by proposing 3D Rotary Position Encoding (3D-RPE), which enhances long-context Natural Language Understanding and long-sequence Language Modeling tasks, achieving performance improvements over the standard RoPE method.
Inspired by the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D Rotary Position Encoding (RoPE), with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay, 3D-RPE allows for the regulation of long-term decay within the chunk size, ensuring the modeling of relative positional information between tokens at a distant relative position. For enhanced position resolution, 3D-RPE can mitigate the degradation of position resolution caused by position interpolation on RoPE. We have conducted experiments on long-context Natural Language Understanding (NLU) and long-sequence Language Modeling (LM) tasks. From the experimental results, 3D-RPE achieved performance improvements over RoPE, especially in long-context NLU tasks.