ASSDSPMay 17, 2021

Dual-Stage Low-Complexity Reconfigurable Speech Enhancement

arXiv:2105.07632v11 citations
Originality Incremental advance
AI Analysis

This work addresses speech quality enhancement for telecommunication and voice-trigger systems, though it appears incremental as it builds on existing speech enhancement methods.

The paper tackles speech enhancement in noisy environments by proposing a dual-stage, low-complexity, reconfigurable technique, resulting in significant improvements in metrics like 3QUEST (including SMOS and NMOS) and SNR for both near-field and far-field applications.

This paper proposes a dual-stage, low complexity, and reconfigurable technique to enhance the speech contaminated by various types of noise sources. Driven by input data and audio contents, the proposed dual-stage speech enhancement approach performs a coarse and fine processing in the first-stage and second-stage, respectively. In this paper, we demonstrate that the proposed speech enhancement solution significantly enhances the metrics of 3-fold QUality Evaluation of Speech in Telecommunication (3QUEST) consisting of speech mean-opinion-score (SMOS) and noise MOS (NMOS) for near-field and far-field applications. Moreover, the proposed speech enhancement approach greatly improves both the signal-to-noise ratio (SNR) and subjective listening experience. For comparisons, the traditional speech enhancement methods reduce the SMOS although they increase NMOS and SNR. In addition, the proposed speech enhancement scheme can be easily adopted in both capture path and speech render path for speech communication and conferencing systems, and voice-trigger applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes