SDAIAug 10, 2021

StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition

arXiv:2108.04395v11 citations
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in voice conversion for low-resource scenarios, but it is incremental as it builds on existing StarGAN-VC.

The paper tackled the problem of preserving linguistic content in non-parallel voice conversion with StarGAN-VC when training data is scarce, and the result showed that integrating automatic speech recognition improved retention of linguistic information compared to the vanilla method.

Preserving the linguistic content of input speech is essential during voice conversion (VC). The star generative adversarial network-based VC method (StarGAN-VC) is a recently developed method that allows non-parallel many-to-many VC. Although this method is powerful, it can fail to preserve the linguistic content of input speech when the number of available training samples is extremely small. To overcome this problem, we propose the use of automatic speech recognition to assist model training, to improve StarGAN-VC, especially in low-resource scenarios. Experimental results show that using our proposed method, StarGAN-VC can retain more linguistic information than vanilla StarGAN-VC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes