ASSDJul 1, 2019

Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation

arXiv:1907.00797v416 citations
Originality Incremental advance
AI Analysis

This work addresses a limitation in speech synthesis for applications requiring precise pitch control, such as expressive or singing voice generation, though it is incremental as it builds on the established WaveNet framework.

The paper tackled the problem of WaveNet vocoders lacking pitch controllability when generating speech with F0 values outside the training range, proposing a quasi-periodic neural network (QPNet) vocoder with pitch-dependent dilated convolution (PDCNN) that demonstrated better pitch controllability compared to WaveNet vocoders while maintaining comparable speech quality in objective and subjective tests.

In this paper, we propose a quasi-periodic neural network (QPNet) vocoder with a novel network architecture named pitch-dependent dilated convolution (PDCNN) to improve the pitch controllability of WaveNet (WN) vocoder. The effectiveness of the WN vocoder to generate high-fidelity speech samples from given acoustic features has been proved recently. However, because of the fixed dilated convolution and generic network architecture, the WN vocoder hardly generates speech with given F0 values which are outside the range observed in training data. Consequently, the WN vocoder lacks the pitch controllability which is one of the essential capabilities of conventional vocoders. To address this limitation, we propose the PDCNN component which has the time-variant adaptive dilation size related to the given F0 values and a cascade network structure of the QPNet vocoder to generate quasi-periodic signals such as speech. Both objective and subjective tests are conducted, and the experimental results demonstrate the better pitch controllability of the QPNet vocoder compared to the same and double sized WN vocoders while attaining comparable speech qualities. Index Terms: WaveNet, vocoder, quasi-periodic signal, pitch-dependent dilated convolution, pitch controllability

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes