SD ASMay 5

MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline

Fang-Duo Tsai, Yi-An Lai, Fei-Yueh Chen, Hsueh-Wei Fu, Wei-Jaw Lee, Hao-Chung Cheng, Yi-Hsuan Yang

arXiv:2602.2202975.91 citationsh-index: 10Has Code

AI Analysis

For professional songwriters, this provides a score-to-song pipeline that retains authorship over the core melody while enabling long-form generation.

MIDI-SAG uses symbolic timing and chord information from vocal MIDI to generate coherent long-form singing accompaniment, addressing the bottleneck of maintaining coherence in vocal-silent sections like intros and bridges.

While end-to-end lyrics-to-song models offer convenience for casual users, professional songwriters require score-to-song systems that allow them to retain authorship over the core melody. However, existing score-to-song methods are limited to short-form snippets and fail to maintain coherence in long-form generation, particularly during vocal-silent sections like intros and bridges. To address this long-form bottleneck, we propose MIDI-informed singing accompaniment generation (MIDI-SAG). Unlike conventional audio-only models, MIDI-SAG utilizes symbolic timing and chord information derived from the vocal MIDI to provide a stable musical roadmap. By incorporating structure planning, which defines temporal boundaries and semantic labels, our framework facilitates consistent generation across both vocal and non-vocal sections. We demonstrate the feasibility of this compositional pipeline by leveraging specialized pre-trained modules, enabling data-efficient training on a single GPU. Our experiments show the potential of this approach for both professional score-to-song and general lyrics-to-song tasks. While an early exploration, MIDI-SAG suggests a promising direction for structured, long-form music synthesis. Audio demos are available, and the code will be open-sourced at https://composerflow.github.io/web_revealed/.

View on arXiv PDF

Similar