StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
This addresses a fundamental bottleneck in state-space models for machine learning applications like language and image processing, though it is incremental as it builds on existing SSM frameworks.
The paper tackles the memory limitation in state-space models (SSMs) by proving they suffer from a 'curse of memory' similar to RNNs, and introduces a reparameterization technique called StableSSM that alleviates this issue, improving approximation capabilities and optimization stability, validated on synthetic datasets, language models, and image classification tasks.
In this paper, we investigate the long-term memory learning capabilities of state-space models (SSMs) from the perspective of parameterization. We prove that state-space models without any reparameterization exhibit a memory limitation similar to that of traditional RNNs: the target relationships that can be stably approximated by state-space models must have an exponential decaying memory. Our analysis identifies this "curse of memory" as a result of the recurrent weights converging to a stability boundary, suggesting that a reparameterization technique can be effective. To this end, we introduce a class of reparameterization techniques for SSMs that effectively lift its memory limitations. Besides improving approximation capabilities, we further illustrate that a principled choice of reparameterization scheme can also enhance optimization stability. We validate our findings using synthetic datasets, language models and image classifications.