SDMar 15, 2017

On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement

arXiv:1703.05003v222 citations
Originality Synthesis-oriented
AI Analysis

This work addresses noise reduction in speech enhancement for applications like communication systems, showing incremental improvements by optimizing priors in existing MLSE frameworks.

The paper demonstrates that using super-Gaussian priors in machine-learning spectral envelope (MLSE)-based speech enhancement methods significantly reduces noise between speech harmonics, outperforming Gaussian priors like the Wiener filter, as confirmed by listening experiments and instrumental measures.

For enhancing noisy signals, machine-learning based single-channel speech enhancement schemes exploit prior knowledge about typical speech spectral structures. To ensure a good generalization and to meet requirements in terms of computational complexity and memory consumption, certain methods restrict themselves to learning speech spectral envelopes. We refer to these approaches as machine-learning spectral envelope (MLSE)-based approaches. In this paper we show by means of theoretical and experimental analyses that for MLSE-based approaches, super-Gaussian priors allow for a reduction of noise between speech spectral harmonics which is not achievable using Gaussian estimators such as the Wiener filter. For the evaluation, we use a deep neural network (DNN)-based phoneme classifier and a low-rank nonnegative matrix factorization (NMF) framework as examples of MLSE-based approaches. A listening experiment and instrumental measures confirm that while super-Gaussian priors yield only moderate improvements for classic enhancement schemes, for MLSE-based approaches super-Gaussian priors clearly make an important difference and significantly outperform Gaussian priors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes