NCCLLGJul 21, 2025

Dissociating model architectures from inference computations

arXiv:2507.15776v12 citationsh-index: 2Cogn Neurosci
Originality Incremental advance
AI Analysis

This work addresses a fundamental issue in machine learning for researchers and practitioners working with sequence models, though it appears incremental in building on prior work.

The paper tackles the problem of separating model architectures from inference computations in sequence modeling, demonstrating that autoregressive models can mimic deep temporal computations through structured context access during iterative inference while maintaining predictive capacity with fewer computations.

Parr et al., 2025 examines how auto-regressive and deep temporal models differ in their treatment of non-Markovian sequence modelling. Building on this, we highlight the need for dissociating model architectures, i.e., how the predictive distribution factorises, from the computations invoked at inference. We demonstrate that deep temporal computations are mimicked by autoregressive models by structuring context access during iterative inference. Using a transformer trained on next-token prediction, we show that inducing hierarchical temporal factorisation during iterative inference maintains predictive capacity while instantiating fewer computations. This emphasises that processes for constructing and refining predictions are not necessarily bound to their underlying model architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes