SDLGASNov 14, 2019

Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement

arXiv:1911.06266v12 citations
Originality Incremental advance
AI Analysis

This addresses speech quality issues in enhancement systems for applications like communication, though it is incremental as it extends prior work on PR to multi-speaker scenarios.

The paper tackled the problem of low-quality speech from traditional enhancement systems by proposing parametric resynthesis (PR) using neural vocoders, showing that PR with multi-speaker training achieves higher objective and subjective quality than state-of-the-art methods, including outperforming an oracle Wiener mask in subjective tests.

Traditional speech enhancement systems produce speech with compromised quality. Here we propose to use the high quality speech generation capability of neural vocoders for better quality speech enhancement. We term this parametric resynthesis (PR). In previous work, we showed that PR systems generate high quality speech for a single speaker using two neural vocoders, WaveNet and WaveGlow. Both these vocoders are traditionally speaker dependent. Here we first show that when trained on data from enough speakers, these vocoders can generate speech from unseen speakers, both male and female, with similar quality as seen speakers in training. Next using these two vocoders and a new vocoder LPCNet, we evaluate the noise reduction quality of PR on unseen speakers and show that objective signal and overall quality is higher than the state-of-the-art speech enhancement systems Wave-U-Net, Wavenet-denoise, and SEGAN. Moreover, in subjective quality, multiple-speaker PR out-performs the oracle Wiener mask.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes