LGMar 14

PDE-SSM: A Spectral State Space Approach to Spatial Mixing in Diffusion Transformers

arXiv:2603.1366372.3h-index: 11
Predicted impact top 45% in LG · last 90 daysOriginality Highly original
AI Analysis

This addresses efficiency and inductive bias limitations in vision transformers for generative modeling, offering a scalable alternative to attention.

The authors tackled the quadratic cost and weak spatial inductive bias of self-attention in vision transformers by proposing PDE-SSM, a spatial state-space block based on a learnable convection-diffusion-reaction PDE, achieving near-linear complexity of O(N log N). PDE-SSM-DiT matched or exceeded state-of-the-art Diffusion Transformers' performance while substantially reducing compute.

The success of vision transformers-especially for generative modeling-is limited by the quadratic cost and weak spatial inductive bias of self-attention. We propose PDE-SSM, a spatial state-space block that replaces attention with a learnable convection-diffusion-reaction partial differential equation. This operator encodes a strong spatial prior by modeling information flow via physically grounded dynamics rather than all-to-all token interactions. Solving the PDE in the Fourier domain yields global coupling with near-linear complexity of $O(N \log N)$, delivering a principled and scalable alternative to attention. We integrate PDE-SSM into a flow-matching generative model to obtain the PDE-based Diffusion Transformer PDE-SSM-DiT. Empirically, PDE-SSM-DiT matches or exceeds the performance of state-of-the-art Diffusion Transformers while substantially reducing compute. Our results show that, analogous to 1D settings where SSMs supplant attention, multi-dimensional PDE operators provide an efficient, inductive-bias-rich foundation for next-generation vision models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes