F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement
This work addresses the need for improved audio quality in communication systems, offering an incremental enhancement over existing methods for acoustic echo cancellation and speech enhancement.
The paper tackled the problem of robust acoustic echo cancellation in noisy and nonlinear acoustic scenarios by proposing a real-time approach using a complex neural network with frequency-time-LSTMs and a modified SI-SNR cost function, achieving a 0.27 improvement in Mean Opinion Score over the baseline with only 1.4M parameters.
With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-T-LSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the AEC-challenge baseline by 0.27 in terms of Mean Opinion Score (MOS).