ASSDOct 25, 2021

Controllable and Interpretable Singing Voice Decomposition via Assem-VC

arXiv:2110.12676v1
Originality Synthesis-oriented
AI Analysis

This addresses the problem of controllable and interpretable singing voice synthesis for music production or entertainment applications, but it appears incremental as it builds on existing voice conversion methods.

The paper tackled singing voice decomposition by encoding linguistic content, pitch, and speaker identity using Assem-VC, enabling synthesis of a target speaker's singing voice from decomposed components. The result was a perfectly synced duet between a user's voice and the converted target singer's voice.

We propose a singing decomposition system that encodes time-aligned linguistic content, pitch, and source speaker identity via Assem-VC. With decomposed speaker-independent information and the target speaker's embedding, we could synthesize the singing voice of the target speaker. In conclusion, we made a perfectly synced duet with the user's singing voice and the target singer's converted singing voice.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes