LG SY PRAug 27, 2025

What can we learn from signals and systems in a transformer? Insights for probabilistic modeling and inference architecture

arXiv:2508.20211v14.1h-index: 6

Originality Synthesis-oriented

AI Analysis

This work bridges classical nonlinear filtering theory with modern inference architectures, offering insights for probabilistic modeling, but it is incremental as it builds on existing transformer concepts without introducing new methods or broad applications.

The paper tackles the problem of interpreting transformer architectures through the lens of classical signals and systems theory, resulting in a probabilistic model that frames transformer signals as conditional measures and layer operations as fixed-point updates, with an explicit form derived for hidden Markov models.

In the 1940s, Wiener introduced a linear predictor, where the future prediction is computed by linearly combining the past data. A transformer generalizes this idea: it is a nonlinear predictor where the next-token prediction is computed by nonlinearly combining the past tokens. In this essay, we present a probabilistic model that interprets transformer signals as surrogates of conditional measures, and layer operations as fixed-point updates. An explicit form of the fixed-point update is described for the special case when the probabilistic model is a hidden Markov model (HMM). In part, this paper is in an attempt to bridge the classical nonlinear filtering theory with modern inference architectures.

View on arXiv PDF

Similar