The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval
This addresses a potential bottleneck in long-context modeling for LLM users, but it is incremental as it focuses on analyzing an existing method.
The paper tackles the problem of Rotary Position Embedding (RoPE) causing dimension inefficiency in attention heads for long-distance retrieval in large language models, showing through controlled experiments that RoPE leads to low utility of certain dimensions and that these dimensions do not aid in long-context question answering.
The Rotary Position Embedding (RoPE) is widely used in the attention heads of many large language models (LLM). It rotates dimensions in the query and the key vectors by different angles according to their positions in the input sequence. For long context modeling, the range of positions may vary a lot, and thus RoPE rotates some dimensions by a great range of angles. We hypothesize that the wide range of rotation angles may prevent LLMs from utilizing those dimensions. To validate this hypothesis, we present a controlled experiment showing that applying RoPE causes low utility of certain dimensions. Our analyses on three LLMs also indicate that these dimensions do not help LLMs do long-context question answering.