SD CV ASAug 24, 2022

Deep model with built-in cross-attention alignment for acoustic echo cancellation

Evgenii Indenbom, Nicolae-Cătălin Ristea, Ando Saabas, Tanel Pärnamaa, Jegor Gužvin

arXiv:2208.11308v28.312 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses poor audio quality in teleconferencing by simplifying the pipeline and improving performance for difficult delay cases, though it is incremental as it builds on existing deep learning methods for AEC.

The paper tackled the problem of acoustic echo cancellation in teleconferencing by proposing a deep learning model with built-in cross-attention alignment to handle unaligned microphone and far end signals, achieving significant improvements on real recordings from the AEC Challenge dataset.

With recent research advances, deep learning models have become an attractive choice for acoustic echo cancellation (AEC) in real-time teleconferencing applications. Since acoustic echo is one of the major sources of poor audio quality, a wide variety of deep models have been proposed. However, an important but often omitted requirement for good echo cancellation quality is the synchronization of the microphone and far end signals. Typically implemented using classical algorithms based on cross-correlation, the alignment module is a separate functional block with known design limitations. In our work we propose a deep learning architecture with built-in self-attention based alignment, which is able to handle unaligned inputs, improving echo cancellation performance while simplifying the communication pipeline. Moreover, we show that our approach achieves significant improvements for difficult delay estimation cases on real recordings from AEC Challenge data set.

View on arXiv PDF

Similar