SD ASJul 20, 2021

Joint Echo Cancellation and Noise Suppression based on Cascaded Magnitude and Complex Mask Estimation

Xiaofeng Shu, Yehang Zhu, Yanjie Chen, Li Chen, Haohe Liu, Chuanzeng Huang, Yuxuan Wang

arXiv:2107.09298v15.912 citations

Originality Incremental advance

AI Analysis

This addresses speech intelligibility degradation for real-time communication systems, representing an incremental improvement over separated task approaches.

The paper tackled the joint removal of acoustic echo and background noise in speech by proposing a cascaded magnitude and complex temporal convolutional neural network (MC-TCN) with adaptive filters, achieving a mean DECMOS score of 4.41 and outperforming a baseline by 0.54.

Acoustic echo and background noise can seriously degrade the intelligibility of speech. In practice, echo and noise suppression are usually treated as two separated tasks and can be removed with various digital signal processing (DSP) and deep learning techniques. In this paper, we propose a new cascaded model, magnitude and complex temporal convolutional neural network (MC-TCN), to jointly perform acoustic echo cancellation and noise suppression with the help of adaptive filters. The MC-TCN cascades two separation cores, which are used to extract robust magnitude spectra feature and to enhance magnitude and phase simultaneously. Experimental results reveal that the proposed method can achieve superior performance by removing both echo and noise in real-time. In terms of DECMOS, the subjective test shows our method achieves a mean score of 4.41 and outperforms the INTERSPEECH2021 AEC-Challenge baseline by 0.54.

View on arXiv PDF

Similar