LGFeb 12

SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion

Chengting Yu, Xiaobo Shu, Yadao Wang, Yizhen Zhang, Haoyi Wu, You Wu, Rujiao Long, Ziheng Chen, Yuchi Xu, Wenbo Su, Bo Zheng

arXiv:2602.11698v15.85 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses efficiency and performance issues in recursive Transformers for machine learning researchers, offering an incremental improvement over existing looped architectures.

The paper tackled the problem of looped Transformers underperforming due to fixed full-token resolution by proposing SpiralFormer, which uses multi-resolution recursion to learn hierarchical dependencies, achieving better parameter and compute efficiency than baselines across model scales from 160M to 1.4B.

Recursive (looped) Transformers decouple computational depth from parameter depth by repeatedly applying shared layers, providing an explicit architectural primitive for iterative refinement and latent reasoning. However, early looped Transformers often underperform non-recursive baselines of equal compute. While recent literature has introduced more effective recursion mechanisms to mitigate this gap, existing architectures still operate at a fixed, full-token resolution, neglecting the potential efficiency of computing over compressed latent representations. In this paper, we propose SpiralFormer, a looped Transformer that executes recurrence under a multi-resolution recursion schedule. We provide probing evidence that multi-resolution recursion enables the model to learn hierarchical dependencies by inducing iteration-wise functional specialization across different scales. Empirically, SpiralFormer achieves better parameter and compute efficiency than both looped and non-looped baselines across model scales from 160M to 1.4B, establishing sequence resolution as a potential axis for scaling recursive architectures.

View on arXiv PDF

Similar