PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation
This work addresses the problem of generating high-quality symbolic music for applications in AI-based music creation, though it appears incremental by building on existing Perceiver architectures.
The paper tackles the challenge of generating long-structured and expressive symbolic music by proposing PerceiverS, which uses effective segmentation and multi-scale attention mechanisms, resulting in improved coherence and diversity in generated music as evaluated on the Maestro dataset.
AI-based music generation has made significant progress in recent years. However, generating symbolic music that is both long-structured and expressive remains a significant challenge. In this paper, we propose PerceiverS (Segmentation and Scale), a novel architecture designed to address this issue by leveraging both Effective Segmentation and Multi-Scale attention mechanisms. Our approach enhances symbolic music generation by simultaneously learning long-term structural dependencies and short-term expressive details. By combining cross-attention and self-attention in a Multi-Scale setting, PerceiverS captures long-range musical structure while preserving performance nuances. The proposed model has been evaluated using the Maestro dataset and has demonstrated improvements in generating coherent and diverse music, characterized by both structural consistency and expressive variation. The project demos and the generated music samples can be accessed through the link: https://perceivers.github.io.