ASLGSDSPFeb 15, 2021

PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components

arXiv:2102.07786v117 citations
AI Analysis

This work addresses speech synthesis quality for applications like vocoders, but it is incremental as it builds on existing non-autoregressive waveform generation methods.

The authors tackled the problem of generating high-quality speech waveforms by proposing PeriodNet, a non-autoregressive model that separates periodic and aperiodic components without needing external decomposition, and experiments on a singing voice corpus showed improved naturalness in generated waveforms, including for pitches outside the training range.

We propose PeriodNet, a non-autoregressive (non-AR) waveform generation model with a new model structure for modeling periodic and aperiodic components in speech waveforms. The non-AR waveform generation models can generate speech waveforms parallelly and can be used as a speech vocoder by conditioning an acoustic feature. Since a speech waveform contains periodic and aperiodic components, both components should be appropriately modeled to generate a high-quality speech waveform. However, it is difficult to decompose the components from a natural speech waveform in advance. To address this issue, we propose a parallel model and a series model structure separating periodic and aperiodic components. The features of our proposed models are that explicit periodic and aperiodic signals are taken as input, and external periodic/aperiodic decomposition is not needed in training. Experiments using a singing voice corpus show that our proposed structure improves the naturalness of the generated waveform. We also show that the speech waveforms with a pitch outside of the training data range can be generated with more naturalness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes