SD AI LG ASOct 1, 2025

Linear RNNs for autoregressive generation of long music samples

Konrad Szewczyk, Daniel Gallo Fernández, James Townsend

arXiv:2510.02401v14.0

Originality Incremental advance

AI Analysis

This addresses the problem of generating long music samples for audio synthesis, but it is incremental as it builds on existing linear RNN methods.

The paper tackled the challenge of autoregressive audio waveform generation by pushing the boundaries of linear RNNs, achieving state-of-the-art log-likelihoods and perceptual metrics on small-scale datasets.

Directly learning to generate audio waveforms in an autoregressive manner is a challenging task, due to the length of the raw sequences and the existence of important structure on many different timescales. Traditional approaches based on recurrent neural networks, as well as causal convolutions and self-attention, have only had limited success on this task. However, recent work has shown that deep state space models, also referred to as linear RNNs, can be highly efficient in this context. In this work, we push the boundaries of linear RNNs applied to raw audio modeling, investigating the effects of different architectural choices and using context-parallelism to enable training on sequences up to one minute (1M tokens) in length. We present a model, HarmonicRNN, which attains state of the art log-likelihoods and perceptual metrics on small-scale datasets.

View on arXiv PDF

Similar