ASSDMar 31, 2021

Y$^2$-Net FCRN for Acoustic Echo and Noise Suppression

arXiv:2103.17189v27 citations
AI Analysis

This work addresses the challenge of maintaining near-end speech quality in integrated echo and noise suppression systems, which is an incremental improvement for audio processing applications like teleconferencing.

The paper tackles the problem of combined acoustic echo and noise suppression by proposing a two-stage deep neural network (Y^2-Net) that first estimates echo and then uses that estimate for residual echo and noise suppression, achieving an average improvement of 0.46 points over the baseline on the DECMOS metric in double-talk scenarios.

In recent years, deep neural networks (DNNs) were studied as an alternative to traditional acoustic echo cancellation (AEC) algorithms. The proposed models achieved remarkable performance for the separate tasks of AEC and residual echo suppression (RES). A promising network topology is a fully convolutional recurrent network (FCRN) structure, which has already proven its performance on both noise suppression and AEC tasks, individually. However, the combination of AEC, postfiltering, and noise suppression to a single network typically leads to a noticeable decline in the quality of the near-end speech component due to the lack of a separate loss for echo estimation. In this paper, we propose a two-stage model (Y$^2$-Net) which consists of two FCRNs, each with two inputs and one output (Y-Net). The first stage (AEC) yields an echo estimate, which - as a novelty for a DNN AEC model - is further used by the second stage to perform RES and noise suppression. While the subjective listening test of the Interspeech 2021 AEC Challenge mostly yielded results close to the baseline, the proposed method scored an average improvement of 0.46 points over the baseline on the blind testset in double-talk on the instrumental metric DECMOS, provided by the challenge organizers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes