LGAIFeb 10

Step-resolved data attribution for looped transformers

arXiv:2602.10097v12 citationsh-index: 18
Originality Incremental advance
AI Analysis

This provides interpretability for researchers and practitioners working with recurrent neural architectures, though it is incremental as it builds on existing influence estimators.

The authors tackled the problem of attributing training data influence in looped transformers, introducing Step-Decomposed Influence (SDI) to decompose influence into per-loop-iteration trajectories, which scaled well and matched baselines with low error.

We study how individual training examples shape the internal computation of looped transformers, where a shared block is applied for $τ$ recurrent iterations to enable latent reasoning. Existing training-data influence estimators such as TracIn yield a single scalar score that aggregates over all loop iterations, obscuring when during the recurrent computation a training example matters. We introduce \textit{Step-Decomposed Influence (SDI)}, which decomposes TracIn into a length-$τ$ influence trajectory by unrolling the recurrent computation graph and attributing influence to specific loop iterations. To make SDI practical at transformer scale, we propose a TensorSketch implementation that never materialises per-example gradients. Experiments on looped GPT-style models and algorithmic reasoning tasks show that SDI scales excellently, matches full-gradient baselines with low error and supports a broad range of data attribution and interpretability tasks with per-step insights into the latent reasoning process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes