SD AIAug 10, 2021

StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition

Shoki Sakamoto, Akira Taniguchi, Tadahiro Taniguchi, Hirokazu Kameoka

arXiv:2108.04395v14.31 citations

Originality Incremental advance

AI Analysis

This addresses a specific bottleneck in voice conversion for low-resource scenarios, but it is incremental as it builds on existing StarGAN-VC.

The paper tackled the problem of preserving linguistic content in non-parallel voice conversion with StarGAN-VC when training data is scarce, and the result showed that integrating automatic speech recognition improved retention of linguistic information compared to the vanilla method.

Preserving the linguistic content of input speech is essential during voice conversion (VC). The star generative adversarial network-based VC method (StarGAN-VC) is a recently developed method that allows non-parallel many-to-many VC. Although this method is powerful, it can fail to preserve the linguistic content of input speech when the number of available training samples is extremely small. To overcome this problem, we propose the use of automatic speech recognition to assist model training, to improve StarGAN-VC, especially in low-resource scenarios. Experimental results show that using our proposed method, StarGAN-VC can retain more linguistic information than vanilla StarGAN-VC.

View on arXiv PDF

Similar