SDLGMLDec 22, 2016

Robustness of Voice Conversion Techniques Under Mismatched Conditions

arXiv:1612.07523v12 citations
Originality Synthesis-oriented
AI Analysis

This addresses robustness issues in voice conversion for applications in noisy environments, but it is incremental as it builds on existing methods.

The paper tackled the problem of voice conversion performance degradation under mismatched conditions, finding that bilinear frequency warping with amplitude scaling outperforms other methods in noisy settings, and that spectral subtraction and logMMSE speech enhancement can improve performance in specific noisy conditions.

Most of the existing studies on voice conversion (VC) are conducted in acoustically matched conditions between source and target signal. However, the robustness of VC methods in presence of mismatch remains unknown. In this paper, we report a comparative analysis of different VC techniques under mismatched conditions. The extensive experiments with five different VC techniques on CMU ARCTIC corpus suggest that performance of VC methods substantially degrades in noisy conditions. We have found that bilinear frequency warping with amplitude scaling (BLFWAS) outperforms other methods in most of the noisy conditions. We further explore the suitability of different speech enhancement techniques for robust conversion. The objective evaluation results indicate that spectral subtraction and log minimum mean square error (logMMSE) based speech enhancement techniques can be used to improve the performance in specific noisy conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes