SD AI MMAug 28, 2025

Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music

Hongju Su, Ke Li, Lan Yang, Honggang Zhang, Yi-Zhe Song

arXiv:2508.20665v14.0h-index: 4Has Code

Originality Highly original

AI Analysis

This addresses the problem of efficient and high-quality symbolic music generation for AI music applications, representing a novel architectural approach rather than an incremental improvement.

The paper tackles the problem of symbolic music generation by challenging the assumption that musical note attributes have strict temporal dependencies, proposing Amadeus which combines an autoregressive model for note sequences with a bidirectional diffusion model for attributes. The result is a model that significantly outperforms state-of-the-art methods across multiple metrics while achieving at least 4× speed-up.

Existing state-of-the-art symbolic music generation models predominantly adopt autoregressive or hierarchical autoregressive architectures, modelling symbolic music as a sequence of attribute tokens with unidirectional temporal dependencies, under the assumption of a fixed, strict dependency structure among these attributes. However, we observe that using different attributes as the initial token in these models leads to comparable performance. This suggests that the attributes of a musical note are, in essence, a concurrent and unordered set, rather than a temporally dependent sequence. Based on this insight, we introduce Amadeus, a novel symbolic music generation framework. Amadeus adopts a two-level architecture: an autoregressive model for note sequences and a bidirectional discrete diffusion model for attributes. To enhance performance, we propose Music Latent Space Discriminability Enhancement Strategy(MLSDES), incorporating contrastive learning constraints that amplify discriminability of intermediate music representations. The Conditional Information Enhancement Module (CIEM) simultaneously strengthens note latent vector representation via attention mechanisms, enabling more precise note decoding. We conduct extensive experiments on unconditional and text-conditioned generation tasks. Amadeus significantly outperforms SOTA models across multiple metrics while achieving at least 4$\times$ speed-up. Furthermore, we demonstrate training-free, fine-grained note attribute control feasibility using our model. To explore the upper performance bound of the Amadeus architecture, we compile the largest open-source symbolic music dataset to date, AMD (Amadeus MIDI Dataset), supporting both pre-training and fine-tuning.

View on arXiv PDF

Similar