CLAILGMar 4, 2025

(How) Do Language Models Track State?

Meta AIMIT
arXiv:2503.02854v323 citationsh-index: 14ICML
Originality Incremental advance
AI Analysis

This addresses the problem of understanding state tracking in LMs for researchers, showing they can learn interpretable mechanisms, but it is incremental as it builds on prior theoretical work.

The paper investigates how transformer language models track unobserved state, using permutation composition as a model task, and finds they learn two mechanisms: one resembling an associative scan for better generalization and faster convergence, and another combining parity heuristics with a scan, with the ability to steer models toward either mechanism.

Transformer language models (LMs) exhibit behaviors -- from storytelling to code generation -- that seem to require tracking the unobserved state of an evolving world. How do they do this? We study state tracking in LMs trained or fine-tuned to compose permutations (i.e., to compute the order of a set of objects after a sequence of swaps). Despite the simple algebraic structure of this problem, many other tasks (e.g., simulation of finite automata and evaluation of boolean expressions) can be reduced to permutation composition, making it a natural model for state tracking in general. We show that LMs consistently learn one of two state tracking mechanisms for this task. The first closely resembles the "associative scan" construction used in recent theoretical work by Liu et al. (2023) and Merrill et al. (2024). The second uses an easy-to-compute feature (permutation parity) to partially prune the space of outputs, and then refines this with an associative scan. LMs that learn the former algorithm tend to generalize better and converge faster, and we show how to steer LMs toward one or the other with intermediate training tasks that encourage or suppress the heuristics. Our results demonstrate that transformer LMs, whether pre-trained or fine-tuned, can learn to implement efficient and interpretable state-tracking mechanisms, and the emergence of these mechanisms can be predicted and controlled.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes