SDAIApr 7

Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck

arXiv:2604.0552671.0h-index: 2
Predicted impact top 26% in SD · last 90 daysOriginality Incremental advance
AI Analysis

This work solves the problem of controllable singing style conversion for audio synthesis applications, representing an incremental improvement with specific innovations.

The paper tackles singing style conversion by addressing style leakage, dynamic rendering, and high-fidelity generation with limited data, achieving the best naturalness performance in the SVCC2025 evaluation while using less data than competitors.

This paper presents the submission of the S4 team to the Singing Voice Conversion Challenge 2025 (SVCC2025)-a novel singing style conversion system that advances fine-grained style conversion and control within in-domain settings. To address the critical challenges of style leakage, dynamic rendering, and high-fidelity generation with limited data, we introduce three key innovations: a boundary-aware Whisper bottleneck that pools phoneme-span representations to suppress residual source style while preserving linguistic content; an explicit frame-level technique matrix, enhanced by targeted F0 processing during inference, for stable and distinct dynamic style rendering; and a perceptually motivated high-frequency band completion strategy that leverages an auxiliary standard 48kHz SVC model to augment the high-frequency spectrum, thereby overcoming data scarcity without overfitting. In the official SVCC2025 subjective evaluation, our system achieves the best naturalness performance among all submissions while maintaining competitive results in speaker similarity and technique control, despite using significantly less extra singing data than other top-performing systems. Audio samples are available online.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes