ASLGSDApr 2, 2019

Speech denoising by parametric resynthesis

arXiv:1904.01537v13 citations
Originality Incremental advance
AI Analysis

This addresses speech enhancement for noisy audio by introducing a novel target parameterization, though it is incremental as it builds on existing vocoder technology.

The paper tackles speech denoising by using clean speech vocoder parameters as a target for a neural network, achieving subjective quality and intelligibility equal to an oracle Wiener mask and surpassing a realistic DNN-based system.

This work proposes the use of clean speech vocoder parameters as the target for a neural network performing speech enhancement. These parameters have been designed for text-to-speech synthesis so that they both produce high-quality resyntheses and also are straightforward to model with neural networks, but have not been utilized in speech enhancement until now. In comparison to a matched text-to-speech system that is given the ground truth transcripts of the noisy speech, our model is able to produce more natural speech because it has access to the true prosody in the noisy speech. In comparison to two denoising systems, the oracle Wiener mask and a DNN-based mask predictor, our model equals the oracle Wiener mask in subjective quality and intelligibility and surpasses the realistic system. A vocoder-based upper bound shows that there is still room for improvement with this approach beyond the oracle Wiener mask. We test speaker-dependence with two speakers and show that a single model can be used for multiple speakers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes