SD AI LG ASMar 18, 2024

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, Ruiqi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao

arXiv:2403.11780v328.933 citationsh-index: 29Has CodeNAACL

Originality Incremental advance

AI Analysis

This addresses the problem of limited user control in singing synthesis for applications like music production, though it is incremental as it builds on existing SVS methods.

The paper tackles the lack of explicit style control in singing-voice-synthesis by proposing Prompt-Singer, a method that enables attribute control for singer gender, vocal range, and volume using natural language prompts, achieving favorable controlling ability and audio quality.

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation that enables text-conditioned vocal range control while keeping melodic accuracy. Furthermore, we explore various experiment settings, including different types of text representations, text encoder fine-tuning, and introducing speech data to alleviate data scarcity, aiming to facilitate further research. Experiments show that our model achieves favorable controlling ability and audio quality. Audio samples are available at http://prompt-singer.github.io .

View on arXiv PDF Code

Similar