SDAIASAug 2, 2025

Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation

arXiv:2508.01394v12 citationsh-index: 7
Originality Highly original
AI Analysis

This addresses the problem of generating controllable and high-quality long songs for music AIGC applications, representing a novel paradigm shift rather than an incremental improvement.

The paper tackles the challenge of generating long, high-quality songs by introducing a model that uses human-editable symbolic scores instead of raw audio, achieving state-of-the-art performance in efficiency, duration, and perceptual quality, including surpassing commercial solutions like Suno in human evaluations.

Song generation is regarded as the most challenging problem in music AIGC; nonetheless, existing approaches have yet to fully overcome four persistent limitations: controllability, generalizability, perceptual quality, and duration. We argue that these shortcomings stem primarily from the prevailing paradigm of attempting to learn music theory directly from raw audio, a task that remains prohibitively difficult for current models. To address this, we present Bar-level AI Composing Helper (BACH), the first model explicitly designed for song generation through human-editable symbolic scores. BACH introduces a tokenization strategy and a symbolic generative procedure tailored to hierarchical song structure. Consequently, it achieves substantial gains in the efficiency, duration, and perceptual quality of song generation. Experiments demonstrate that BACH, with a small model size, establishes a new SOTA among all publicly reported song generation systems, even surpassing commercial solutions such as Suno. Human evaluations further confirm its superiority across multiple subjective metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes