Fixed-Point RNNs: Interpolating from Diagonal to Dense
This addresses a bottleneck in sequence modeling for AI researchers by enabling more expressive RNNs without increasing parameters, though it appears incremental as an extension of existing diagonal methods.
The paper tackles the limited expressivity of current linear RNNs and state-space models by developing fixed-point parameterizations that interpolate from diagonal to dense mixing, achieving state-of-the-art results on state-tracking benchmarks A5 and S5 while maintaining performance on other tasks.
Linear recurrent neural networks (RNNs) and state-space models (SSMs) such as Mamba have become promising alternatives to softmax-attention as sequence mixing layers in Transformer architectures. Current models, however, do not exhibit the full state-tracking expressivity of RNNs because they rely on channel-wise (i.e. diagonal) sequence mixing. In this paper, we investigate parameterizations of a large class of dense linear RNNs as fixed-points of parallelizable diagonal linear RNNs. The resulting models can naturally trade expressivity for efficiency at a fixed number of parameters and achieve state-of-the-art results on the state-tracking benchmarks $A_5$ and $S_5$, while matching performance on copying and other tasks.