Comba: Improving Bilinear RNNs with Closed-loop Control
This work addresses efficiency and performance issues in sequence modeling for applications in language and vision, representing an incremental improvement over existing bilinear RNN methods.
The paper tackles the limitations of bilinear RNNs by proposing Comba, a variant based on closed-loop control theory with state and output feedback corrections, which achieves superior performance and efficiency in language and vision modeling, as demonstrated by training models with up to 1.3B parameters.
Recent efficient sequence modeling methods such as Gated DeltaNet, TTT, and RWKV-7 have achieved performance improvements by supervising the recurrent memory management through Delta learning rule. Unlike previous state-space models (e.g., Mamba) and gated linear attentions (e.g., GLA), these models introduce interactions between the recurrent state and the key vector, structurally resembling bilinear systems. In this paper, we first introduce the concept of Bilinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then, based on closed-loop control theory, we propose a novel Bilinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus. Comba demonstrates superior performance and computation efficiency in both language and vision modeling.