AIMMSDASNov 13, 2024

PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation

arXiv:2411.08307v31 citationsh-index: 10IEEE Transactions on Audio, Speech, and Language Processing
Originality Incremental advance
AI Analysis

This work addresses the problem of generating high-quality symbolic music for applications in AI-based music creation, though it appears incremental by building on existing Perceiver architectures.

The paper tackles the challenge of generating long-structured and expressive symbolic music by proposing PerceiverS, which uses effective segmentation and multi-scale attention mechanisms, resulting in improved coherence and diversity in generated music as evaluated on the Maestro dataset.

AI-based music generation has made significant progress in recent years. However, generating symbolic music that is both long-structured and expressive remains a significant challenge. In this paper, we propose PerceiverS (Segmentation and Scale), a novel architecture designed to address this issue by leveraging both Effective Segmentation and Multi-Scale attention mechanisms. Our approach enhances symbolic music generation by simultaneously learning long-term structural dependencies and short-term expressive details. By combining cross-attention and self-attention in a Multi-Scale setting, PerceiverS captures long-range musical structure while preserving performance nuances. The proposed model has been evaluated using the Maestro dataset and has demonstrated improvements in generating coherent and diverse music, characterized by both structural consistency and expressive variation. The project demos and the generated music samples can be accessed through the link: https://perceivers.github.io.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes