On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement
This work addresses noise reduction in speech enhancement for applications like communication systems, showing incremental improvements by optimizing priors in existing MLSE frameworks.
The paper demonstrates that using super-Gaussian priors in machine-learning spectral envelope (MLSE)-based speech enhancement methods significantly reduces noise between speech harmonics, outperforming Gaussian priors like the Wiener filter, as confirmed by listening experiments and instrumental measures.
For enhancing noisy signals, machine-learning based single-channel speech enhancement schemes exploit prior knowledge about typical speech spectral structures. To ensure a good generalization and to meet requirements in terms of computational complexity and memory consumption, certain methods restrict themselves to learning speech spectral envelopes. We refer to these approaches as machine-learning spectral envelope (MLSE)-based approaches. In this paper we show by means of theoretical and experimental analyses that for MLSE-based approaches, super-Gaussian priors allow for a reduction of noise between speech spectral harmonics which is not achievable using Gaussian estimators such as the Wiener filter. For the evaluation, we use a deep neural network (DNN)-based phoneme classifier and a low-rank nonnegative matrix factorization (NMF) framework as examples of MLSE-based approaches. A listening experiment and instrumental measures confirm that while super-Gaussian priors yield only moderate improvements for classic enhancement schemes, for MLSE-based approaches super-Gaussian priors clearly make an important difference and significantly outperform Gaussian priors.