LGMLJul 9, 2024

RotRNN: Modelling Long Sequences with Rotations

arXiv:2407.07239v26 citationsh-index: 4
AI Analysis

This work addresses issues in long sequence modeling for machine learning applications, but it is incremental as it builds on existing linear recurrent models.

The paper tackles the drawbacks of linear recurrent neural networks, such as complex initialization and normalization, by proposing RotRNN, a model using rotation matrices, which achieves competitive performance on long sequence modeling datasets.

Linear recurrent neural networks, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs), have recently shown state-of-the-art performance on long sequence modelling benchmarks. Despite their success, their empirical performance is not well understood and they come with a number of drawbacks, most notably their complex initialisation and normalisation schemes. In this work, we address some of these issues by proposing RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices. We show that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation. RotRNN also achieves competitive performance to state-of-the-art linear recurrent models on several long sequence modelling datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes