SDMay 12

Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling

arXiv:2605.1231051.1
Predicted impact top 55% in SD · last 90 daysOriginality Highly original
AI Analysis

For SVC researchers, Poly-SVC addresses the underexplored problem of processing residual harmonies in polyphonic recordings, enabling more realistic conversions.

Poly-SVC introduces a zero-shot, cross-lingual singing voice conversion system that handles residual harmonies from accompanied recordings, outperforming baselines in naturalness, timbre similarity, and harmony reconstruction.

Singing Voice Conversion (SVC) aims to transform a source singing voice into a target singer while preserving lyrics and melody. Most existing SVC methods depend on F0 extractors to capture the lead melody from clean vocals. However, no existing method can reliably extract clean vocals from accompanied recordings without leaving residual harmonies behind. In this paper, we innovatively propose Poly-SVC, a zero-shot, cross-lingual singing voice conversion system designed to process residual harmonies. Poly-SVC is composed of three key components: a Constant-Q Transform (CQT)-based pitch extractor to preserve both the lead melody and residual harmony, a random sampler to reduce interference information from the CQT and a diffusion decoder based on Conditional Flow Matching (CFM) that fuses pitch, content, and timbre features into natural-sounding polyphonic outputs. Experiments demonstrate that Poly-SVC surpasses the baseline models in naturalness, timbre similarity and harmony reconstruction across both harmony-rich and single-melody recordings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes