ASSDDec 6, 2018

Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder

arXiv:1812.02339v210 citations
Originality Incremental advance
AI Analysis

This work addresses speaker adaptation for high-fidelity speech synthesis, but it is incremental as it builds on existing WaveNet and GAN frameworks.

The paper tackled the quality gap and speaker adaptation inefficiency in parallel WaveNet vocoders by proposing a GAN-based end-to-end adaptation method, which reduced the computational cost for new speakers and improved waveform quality in subjective experiments.

Although state-of-the-art parallel WaveNet has addressed the issue of real-time waveform generation, there remains problems. Firstly, due to the noisy input signal of the model, there is still a gap between the quality of generated and natural waveforms. Secondly, a parallel WaveNet is trained under a distillation framework, which makes it tedious to adapt a well trained model to a new speaker. To address these two problems, in this paper we propose an end-to-end adaptation method based on the generative adversarial network (GAN), which can reduce the computational cost for the training of new speaker adaptation. Our subjective experiments shows that the proposed training method can further reduce the quality gap between generated and natural waveforms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes