CLApr 11, 2024

HGRN2: Gated Linear RNNs with State Expansion

MIT
arXiv:2404.07904v2116 citationsh-index: 18
AI Analysis

This work addresses a bottleneck in recurrent neural networks for language modeling, offering an incremental enhancement to improve expressiveness without added parameters.

The authors tackled the limited expressiveness of HGRN due to small recurrent state size by introducing a parameter-free outer product-based state expansion mechanism, resulting in HGRN2 achieving consistent improvements over HGRN and competitive performance with other recurrent models in language modeling.

Hierarchically gated linear RNN (HGRN, \citealt{HGRN}) has demonstrated competitive training speed and performance in language modeling while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, limiting its expressiveness. To address this issue, we introduce a simple outer product-based state expansion mechanism, which significantly enlarges the recurrent state size without introducing any additional parameters. This enhancement also provides a linear attention interpretation for HGRN2, enabling hardware-efficient training. Our extensive experiments verify the advantage of HGRN2 over HGRN consistently across different settings and competitive with other recurrent models.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes