SDLGASJun 16, 2019

Parametric Resynthesis with neural vocoders

arXiv:1906.06762v221 citations
Originality Incremental advance
AI Analysis

This work addresses speech quality degradation in noise suppression systems, offering an incremental improvement for audio processing applications.

The paper tackled noise suppression in speech by using neural vocoders to generate clean speech from predicted mel-spectrograms, achieving better subjective and objective quality scores than existing models like Chimera++ and oracle Wiener mask, with WaveNet outperforming WaveGlow in quality but being slower.

Noise suppression systems generally produce output speech with compromised quality. We propose to utilize the high quality speech generation capability of neural vocoders for noise suppression. We use a neural network to predict clean mel-spectrogram features from noisy speech and then compare two neural vocoders, WaveNet and WaveGlow, for synthesizing clean speech from the predicted mel spectrogram. Both WaveNet and WaveGlow achieve better subjective and objective quality scores than the source separation model Chimera++. Further, WaveNet and WaveGlow also achieve significantly better subjective quality ratings than the oracle Wiener mask. Moreover, we observe that between WaveNet and WaveGlow, WaveNet achieves the best subjective quality scores, although at the cost of much slower waveform generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes