SongBench: A Fine-Grained Multi-Aspect Benchmark for Song Quality Assessment
For researchers and developers of text-to-song generation systems, SongBench provides a fine-grained diagnostic benchmark to evaluate and improve musical coherence and professional quality.
The authors propose SongBench, a multi-aspect benchmark for song quality assessment across seven dimensions, and construct an expert-annotated database of 11,717 samples. The benchmark achieves high correlation with expert ratings, revealing performance gaps in current text-to-song models.
Recent advancements in Text-to-Song generation have enabled realistic musical content production, yet existing evaluation benchmarks lack the professional granularity to capture multi-dimensional aesthetic nuances. In this paper, we propose SongBench, a specialized framework for fine-grained song assessment across seven key dimensions: Vocal, Instrument, Melody, Structure, Arrangement, Mixing, and Musicality. Utilizing this framework, we construct an expert-annotated database comprising 11,717 samples from state-of-the-art models, labeled by music professionals. Extensive experimental results demonstrate that SongBench achieves high correlation with expert ratings. By revealing fine-grained performance gaps in current state-of-the-art models, SongBench serves as a diagnostic benchmark to steer the development toward more professional and musically coherent song generation.