SDASNov 2, 2020

CVC: Contrastive Learning for Non-parallel Voice Conversion

arXiv:2011.00782v214 citations
AI Analysis

This work addresses training inefficiencies and performance limitations in voice conversion for speech processing applications, representing an incremental improvement over existing methods.

The paper tackles the problem of difficult training and unsatisfactory results in non-parallel voice conversion by proposing CVC, a contrastive learning-based adversarial approach. The result shows that CVC matches or outperforms CycleGAN and VAE in one-to-one conversion while reducing training time, and it achieves superior performance in many-to-one conversion for unseen speakers.

Cycle consistent generative adversarial network (CycleGAN) and variational autoencoder (VAE) based models have gained popularity in non-parallel voice conversion recently. However, they often suffer from difficult training process and unsatisfactory results. In this paper, we propose CVC, a contrastive learning-based adversarial approach for voice conversion. Compared to previous CycleGAN-based methods, CVC only requires an efficient one-way GAN training by taking the advantage of contrastive learning. When it comes to non-parallel one-to-one voice conversion, CVC is on par or better than CycleGAN and VAE while effectively reducing training time. CVC further demonstrates superior performance in many-to-one voice conversion, enabling the conversion from unseen speakers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes