SD ASMay 15, 2020

Reverberation Modeling for Source-Filter-based Neural Vocoder

Yang Ai, Xin Wang, Junichi Yamagishi, Zhen-Hua Ling

arXiv:2005.07379v16.23 citations

Originality Incremental advance

AI Analysis

This work addresses incremental improvements in speech synthesis quality for applications like audio processing and virtual assistants by enhancing reverberation modeling in neural vocoders.

The paper tackled improving reverberant effect modeling in source-filter-based neural vocoders by proposing a reverberation module that convolves vocoder output with room impulse responses (RIRs), using global time-invariant and utterance-level time-variant parameterizations; the UTV-RIR approach showed better robustness and perceptual quality in experiments.

This paper presents a reverberation module for source-filter-based neural vocoders that improves the performance of reverberant effect modeling. This module uses the output waveform of neural vocoders as an input and produces a reverberant waveform by convolving the input with a room impulse response (RIR). We propose two approaches to parameterizing and estimating the RIR. The first approach assumes a global time-invariant (GTI) RIR and directly learns the values of the RIR on a training dataset. The second approach assumes an utterance-level time-variant (UTV) RIR, which is invariant within one utterance but varies across utterances, and uses another neural network to predict the RIR values. We add the proposed reverberation module to the phase spectrum predictor (PSP) of a HiNet vocoder and jointly train the model. Experimental results demonstrate that the proposed module was helpful for modeling the reverberation effect and improving the perceived quality of generated reverberant speech. The UTV-RIR was shown to be more robust than the GTI-RIR to unknown reverberation conditions and achieved a perceptually better reverberation effect.

View on arXiv PDF

Similar