LGAICLMATH-PHMay 21, 2025

Physical models realizing the transformer architecture of large language models

arXiv:2507.13354v2
Originality Highly original
AI Analysis

This work addresses a foundational problem in AI theory by offering a novel physical perspective on transformers, which could influence future hardware and algorithm design, though it is incremental in bridging physics and ML.

The paper tackles the theoretical gap in understanding the transformer architecture by constructing physical models that realize large language models as open quantum systems in Fock space over token Hilbert spaces, providing a foundational physical interpretation.

The introduction of the transformer architecture in 2017 marked the most striking advancement in natural language processing. The transformer is a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. However, we believe there is a gap in our theoretical understanding of what the transformer is, and how it works physically. From a physical perspective on modern chips, such as those chips under 28nm, modern intelligent machines should be regarded as open quantum systems beyond conventional statistical systems. Thereby, in this paper, we construct physical models realizing large language models based on a transformer architecture as open quantum systems in the Fock space over the Hilbert space of tokens. Our physical models underlie the transformer architecture for large language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes