LGCLMar 27, 2022

Diagonal State Spaces are as Effective as Structured State Spaces

DeepMindIBM
arXiv:2203.14343v3490 citationsh-index: 59
Originality Incremental advance
AI Analysis

This work provides a simpler alternative for long-range reasoning tasks in modalities like text and audio, but it is incremental as it builds directly on S4 without major new capabilities.

The authors tackled the problem of modeling long-range dependencies in sequential data by showing that diagonal state matrices can match the performance of the more complex Structured State Space (S4) model, achieving comparable results on Long Range Arena tasks and speech classification.

Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video. While attention-based models are a popular and effective choice in modeling short-range interactions, their performance on tasks requiring long range reasoning has been largely inadequate. In an exciting result, Gu et al. (ICLR 2022) proposed the $\textit{Structured State Space}$ (S4) architecture delivering large gains over state-of-the-art models on several long-range tasks across various modalities. The core proposition of S4 is the parameterization of state matrices via a diagonal plus low rank structure, allowing efficient computation. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal. Our $\textit{Diagonal State Space}$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes