LG AI CC CLNov 12, 2024

Circuit Complexity Bounds for RoPE-based Transformer Architecture

Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song

arXiv:2411.07602v225.733 citationsh-index: 21

Originality Incremental advance

AI Analysis

This work addresses a foundational problem for the machine learning and theoretical computer science communities by revealing fundamental limitations in the expressivity of a widely used architecture, despite its empirical success.

The paper tackles the problem of characterizing the expressivity of RoPE-based Transformer architectures by establishing a circuit complexity bound, showing that unless TC^0 = NC^1, such Transformers with poly(n)-precision, O(1) layers, and hidden dimension d ≤ O(n) cannot solve the Arithmetic formula evaluation or Boolean formula value problems.

Characterizing the express power of the Transformer architecture is critical to understanding its capacity limits and scaling law. Recent works provide the circuit complexity bounds to Transformer-like architecture. On the other hand, Rotary Position Embedding ($\mathsf{RoPE}$) has emerged as a crucial technique in modern large language models, offering superior performance in capturing positional information compared to traditional position embeddings, which shows great potential in application prospects, particularly for the long context scenario. Empirical evidence also suggests that $\mathsf{RoPE}$-based Transformer architectures demonstrate greater generalization capabilities compared to conventional Transformer models. In this work, we establish a circuit complexity bound for Transformers with $\mathsf{RoPE}$ attention. Our key contribution is that we show that unless $\mathsf{TC}^0 = \mathsf{NC}^1$, a $\mathsf{RoPE}$-based Transformer with $\mathrm{poly}(n)$-precision, $O(1)$ layers, hidden dimension $d \leq O(n)$ cannot solve the Arithmetic formula evaluation problem or the Boolean formula value problem. This result significantly demonstrates the fundamental limitation of the expressivity of the $\mathsf{RoPE}$-based Transformer architecture, although it achieves giant empirical success. Our theoretical result not only establishes the complexity bound but also may instruct further work on the $\mathsf{RoPE}$-based Transformer.

View on arXiv PDF

Similar