ASCLSDMLApr 2, 2018

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

arXiv:1804.00425v1144 citationsHas Code
Originality Highly original
AI Analysis

This addresses the challenge of voice conversion without paired data, showing for the first time that nonparallel methods can exceed parallel ones, which is a notable advance in speech processing.

The paper tackled the problem of achieving high-quality voice conversion with nonparallel data by proposing a CycleGAN-based method, which significantly outperformed state-of-the-art parallel VC systems in subjective evaluations.

Although voice conversion (VC) algorithms have achieved remarkable success along with the development of machine learning, superior performance is still difficult to achieve when using nonparallel data. In this paper, we propose using a cycle-consistent adversarial network (CycleGAN) for nonparallel data-based VC training. A CycleGAN is a generative adversarial network (GAN) originally developed for unpaired image-to-image translation. A subjective evaluation of inter-gender conversion demonstrated that the proposed method significantly outperformed a method based on the Merlin open source neural network speech synthesis system (a parallel VC system adapted for our setup) and a GAN-based parallel VC system. This is the first research to show that the performance of a nonparallel VC method can exceed that of state-of-the-art parallel VC methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes