Score and Lyrics-Free Singing Voice Generation
This addresses a novel challenge in singing voice generation for applications where scores and lyrics are unavailable, representing an incremental advancement beyond traditional synthesis methods.
The paper tackles the problem of generating singing voices without requiring musical scores or lyrics during training or inference, proposing three generation schemes and implementing them with generative adversarial networks. The models were evaluated both objectively and subjectively, though no concrete numbers are provided in the abstract.
Generative models for singing voice have been mostly concerned with the task of ``singing voice synthesis,'' i.e., to produce singing voice waveforms given musical scores and text lyrics. In this work, we explore a novel yet challenging alternative: singing voice generation without pre-assigned scores and lyrics, in both training and inference time. In particular, we outline three such generation schemes, and propose a pipeline to tackle these new tasks. Moreover, we implement such models using generative adversarial networks and evaluate them both objectively and subjectively.