LG CL CV MLOct 6, 2025

On Structured State-Space Duality

Jerry Yao-Chieh Hu, Xiwen Zhang, Weimin Wu, Han Liu

arXiv:2510.04944v14.11 citationsh-index: 12

Originality Incremental advance

AI Analysis

This work provides theoretical insights for designing efficient sequence models, but it is incremental as it builds on prior SSD concepts.

The paper formalizes and generalizes Structured State-Space Duality (SSD), showing that diagonal state-space models (SSMs) are equivalent to masked attention mechanisms with specific causal masks, bridging recurrent SSMs and Transformers while maintaining training complexity lower bounds and richer dynamics.

Structured State-Space Duality (SSD) [Dao & Gu, ICML 2024] is an equivalence between a simple Structured State-Space Model (SSM) and a masked attention mechanism. In particular, a state-space model with a scalar-times-identity state matrix is equivalent to a masked self-attention with a $1$-semiseparable causal mask. Consequently, the same sequence transformation (model) has two algorithmic realizations: as a linear-time $O(T)$ recurrence or as a quadratic-time $O(T^2)$ attention. In this note, we formalize and generalize this duality: (i) we extend SSD from the scalar-identity case to general diagonal SSMs (diagonal state matrices); (ii) we show that these diagonal SSMs match the scalar case's training complexity lower bounds while supporting richer dynamics; (iii) we establish a necessary and sufficient condition under which an SSM is equivalent to $1$-semiseparable masked attention; and (iv) we show that such duality fails to extend to standard softmax attention due to rank explosion. Together, these results tighten bridge between recurrent SSMs and Transformers, and widen the design space for expressive yet efficient sequence models.

View on arXiv PDF

Similar