LGSep 27, 2025

Two-Scale Latent Dynamics for Recurrent-Depth Transformers

arXiv:2509.23314v26 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses computational efficiency for users of recurrent-depth transformers, offering an incremental improvement over prior exit strategies.

The paper tackles the problem of inefficient test-time compute in recurrent-depth transformers by analyzing the geometry of latent iterates, revealing a two-scale operational picture of small-scale refinements within loops and larger-scale drift across blocks. The result is an early-exit mechanism based on second-order step-size differences, which outperforms existing methods in performance, stability, and time-efficiency.

Recurrent-depth transformers scale test-time compute by iterating latent computations before emitting tokens. We study the geometry of these iterates and argue for a simple, two-scale operational picture: (i) within a looped block, updates act as small-scale refinements; (ii) across consecutive blocks, states undergo a larger-scale drift. Across training, our measurements show that loop steps become smaller and increasingly orthogonal to one another, indicating better local modeling of fine structure rather than merely pushing in a single direction. These dynamics motivate an early-exit mechanism based on the model's second-order difference in step-size, which we show is superior in terms of performance, stability and time-efficiency, when compared to the KL-divergence exit strategy of Geiping et al. and its naive first-order counterpart.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes