LGAIMLOct 11, 2025

What Makes Looped Transformers Perform Better Than Non-Recursive Ones (Provably)

arXiv:2510.10089v23 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap for researchers in deep learning, providing insights into architectural advantages for complex reasoning tasks, though it is incremental as it builds on existing landscape models.

The paper tackles the theoretical basis for why looped transformers outperform standard transformers on complex reasoning tasks by analyzing loss landscape geometry, showing that looped transformers induce a V-shaped valley landscape leading to better loss convergence and learning of complex patterns, and proposes a staged training framework that accelerates training while achieving comparable performance.

While looped transformers (termed as Looped-Attn) often outperform standard transformers (termed as Single-Attn) on complex reasoning tasks, the theoretical basis for this advantage remains underexplored. In this paper, we explain this phenomenon through the lens of loss landscape geometry, inspired by empirical observations of their distinct dynamics at both sample and Hessian levels. To formalize this, we extend the River-Valley landscape model by distinguishing between U-shaped valleys (flat) and V-shaped valleys (steep). Based on empirical observations, we conjecture that the recursive architecture of Looped-Attn induces a landscape-level inductive bias towards River-V-Valley. Theoretical derivations based on this inductive bias guarantee a better loss convergence along the river due to valley hopping, and further encourage learning about complex patterns compared to the River-U-Valley induced by Single-Attn. Building on this insight, we propose SHIFT (Staged HIerarchical Framework for Progressive Training), a staged training framework that accelerates the training process of Looped-Attn while achieving comparable performances.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes