SDAILGASMar 18, 2024

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

arXiv:2403.11780v333 citationsh-index: 29NAACL
Originality Incremental advance
AI Analysis

This addresses the problem of limited user control in singing synthesis for applications like music production, though it is incremental as it builds on existing SVS methods.

The paper tackles the lack of explicit style control in singing-voice-synthesis by proposing Prompt-Singer, a method that enables attribute control for singer gender, vocal range, and volume using natural language prompts, achieving favorable controlling ability and audio quality.

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation that enables text-conditioned vocal range control while keeping melodic accuracy. Furthermore, we explore various experiment settings, including different types of text representations, text encoder fine-tuning, and introducing speech data to alleviate data scarcity, aiming to facilitate further research. Experiments show that our model achieves favorable controlling ability and audio quality. Audio samples are available at http://prompt-singer.github.io .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes