Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions
This work addresses the need for more efficient and flexible SSM-based networks for audio processing tasks, offering a novel architectural approach that balances performance and computational efficiency.
The authors tackled the problem of designing efficient state-space model (SSM) networks by introducing Centaurus, which treats SSM operations as tensor contractions to optimize training efficiency and allows flexible block designs inspired by convolutional networks, resulting in improved performance in audio tasks such as keyword spotting, speech denoising, and ASR, where it achieved competitive results without using LSTMs, CNNs, or attention mechanisms.
We introduce Centaurus, a class of networks composed of generalized state-space model (SSM) blocks, where the SSM operations can be treated as tensor contractions during training. The optimal order of tensor contractions can then be systematically determined for every SSM block to maximize training efficiency. This allows more flexibility in designing SSM blocks beyond the depthwise-separable configuration commonly implemented. The new design choices will take inspiration from classical convolutional blocks including group convolutions, full convolutions, and bottleneck blocks. We architect the Centaurus network with a mixture of these blocks, to balance between network size and performance, as well as memory and computational efficiency during both training and inference. We show that this heterogeneous network design outperforms its homogeneous counterparts in raw audio processing tasks including keyword spotting, speech denoising, and automatic speech recognition (ASR). For ASR, Centaurus is the first network with competitive performance that can be made fully state-space based, without using any nonlinear recurrence (LSTMs), explicit convolutions (CNNs), or (surrogate) attention mechanism. The source code is available as supplementary material on https://openreview.net/forum?id=PkpNRmBZ32