CL LGJan 1, 2025

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding

Jiajun Zhu, Peihao Wang, Ruisi Cai, Jason D. Lee, Pan Li, Zhangyang Wang

arXiv:2501.00712v29.66 citationsh-index: 23Has CodeICML

Originality Incremental advance

AI Analysis

This addresses a foundational bottleneck in transformer architecture for AI/ML, offering a novel method with broad applicability, though it builds incrementally on existing positional encoding techniques.

The paper tackles the problem of rigid positional encoding in transformers limiting long-range dependencies and task adaptation, proposing TAPE which introduces dynamic, context-aware positional encodings to enhance reasoning ability. The result shows superior performance in language modeling, arithmetic reasoning, and long-context retrieval tasks compared to existing methods.

Transformers rely on both content-based and position-based addressing mechanisms to make predictions, but existing positional encoding techniques often diminish the effectiveness of position-based addressing. Many current methods enforce rigid patterns in attention maps, limiting the ability to model long-range dependencies and adapt to diverse tasks. Additionally, most positional encodings are learned as general biases, lacking the specialization required for different instances within a dataset. To address this, we propose con\textbf{T}extualized equivari\textbf{A}nt \textbf{P}osition \textbf{E}ncoding (\textbf{TAPE}), a novel framework that enhances positional embeddings by incorporating sequence content across layers. TAPE introduces dynamic, context-aware positional encodings, overcoming the constraints of traditional fixed patterns. We show that TAPE can provably facilitate LLM reasoning ability by emulating a broader class of algorithms. By enforcing permutation and orthogonal equivariance, TAPE ensures the stability of positional encodings during updates, improving long-context ability. Our method can be easily integrated into pre-trained transformers, offering parameter-efficient fine-tuning with minimal overhead. Extensive experiments show that TAPE achieves superior performance in language modeling, arithmetic reasoning, and long-context retrieval tasks compared to existing positional embedding techniques. Code is available at https://github.com/VITA-Group/TAPE.

View on arXiv PDF Code

Similar