SDASJul 20, 2021

Joint Echo Cancellation and Noise Suppression based on Cascaded Magnitude and Complex Mask Estimation

arXiv:2107.09298v112 citations
Originality Incremental advance
AI Analysis

This addresses speech intelligibility degradation for real-time communication systems, representing an incremental improvement over separated task approaches.

The paper tackled the joint removal of acoustic echo and background noise in speech by proposing a cascaded magnitude and complex temporal convolutional neural network (MC-TCN) with adaptive filters, achieving a mean DECMOS score of 4.41 and outperforming a baseline by 0.54.

Acoustic echo and background noise can seriously degrade the intelligibility of speech. In practice, echo and noise suppression are usually treated as two separated tasks and can be removed with various digital signal processing (DSP) and deep learning techniques. In this paper, we propose a new cascaded model, magnitude and complex temporal convolutional neural network (MC-TCN), to jointly perform acoustic echo cancellation and noise suppression with the help of adaptive filters. The MC-TCN cascades two separation cores, which are used to extract robust magnitude spectra feature and to enhance magnitude and phase simultaneously. Experimental results reveal that the proposed method can achieve superior performance by removing both echo and noise in real-time. In terms of DECMOS, the subjective test shows our method achieves a mean score of 4.41 and outperforms the INTERSPEECH2021 AEC-Challenge baseline by 0.54.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes