AS SDDec 6, 2018

Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder

arXiv:1812.02339v29.210 citations

Originality Incremental advance

AI Analysis

This work addresses speaker adaptation for high-fidelity speech synthesis, but it is incremental as it builds on existing WaveNet and GAN frameworks.

The paper tackled the quality gap and speaker adaptation inefficiency in parallel WaveNet vocoders by proposing a GAN-based end-to-end adaptation method, which reduced the computational cost for new speakers and improved waveform quality in subjective experiments.

Although state-of-the-art parallel WaveNet has addressed the issue of real-time waveform generation, there remains problems. Firstly, due to the noisy input signal of the model, there is still a gap between the quality of generated and natural waveforms. Secondly, a parallel WaveNet is trained under a distillation framework, which makes it tedious to adapt a well trained model to a new speaker. To address these two problems, in this paper we propose an end-to-end adaptation method based on the generative adversarial network (GAN), which can reduce the computational cost for the training of new speaker adaptation. Our subjective experiments shows that the proposed training method can further reduce the quality gap between generated and natural waveforms.

View on arXiv PDF

Similar