AS SDOct 25, 2021

Controllable and Interpretable Singing Voice Decomposition via Assem-VC

arXiv:2110.12676v1

Originality Synthesis-oriented

AI Analysis

This addresses the problem of controllable and interpretable singing voice synthesis for music production or entertainment applications, but it appears incremental as it builds on existing voice conversion methods.

The paper tackled singing voice decomposition by encoding linguistic content, pitch, and speaker identity using Assem-VC, enabling synthesis of a target speaker's singing voice from decomposed components. The result was a perfectly synced duet between a user's voice and the converted target singer's voice.

We propose a singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC. With decomposed speaker-independent information and the target speaker's embedding, we could synthesize the singing voice of the target speaker. In conclusion, we made a perfectly synced duet with the user's singing voice and the target singer's converted singing voice.

View on arXiv PDF

Similar