Spectral Generative Flow Models: A Physics-Inspired Replacement for Vectorized Large Language Models
This work proposes a foundational alternative to current large language models, potentially impacting all of ML/AI by offering a new paradigm for generative tasks.
The paper tackles the problem of generative modeling for text and video by introducing Spectral Generative Flow Models (SGFMs), which replace transformer-based architectures with a physics-inspired continuous field approach, resulting in a framework that emphasizes long-range coherence, multimodal generality, and computational efficiency.
We introduce Spectral Generative Flow Models (SGFMs), a physics-inspired alternative to transformer-based large language models. Instead of representing text or video as sequences of discrete tokens processed by attention, SGFMs treat generation as the evolution of a continuous field governed by constrained stochastic dynamics in a multiscale wavelet basis. This formulation replaces global attention with local operators, spectral projections, and Navier--Stokes-like transport, yielding a generative mechanism grounded in continuity, geometry, and physical structure. Our framework provides three key innovations: (i) a field-theoretic ontology in which text and video are unified as trajectories of a stochastic partial differential equation; (ii) a wavelet-domain representation that induces sparsity, scale separation, and computational efficiency; and (iii) a constrained stochastic flow that enforces stability, coherence, and uncertainty propagation. Together, these components define a generative architecture that departs fundamentally from autoregressive modeling and diffusion-based approaches. SGFMs offer a principled path toward long-range coherence, multimodal generality, and physically structured inductive bias in next-generation generative models.