SD LG ASAug 27, 2021

Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement

Yuzi Yan, Wei-Qiang Zhang, Michael T. Johnson

arXiv:2108.12105v12.3

Originality Incremental advance

AI Analysis

This work addresses speech enhancement, a critical component for technologies like speech recognition and synthesis, but it is incremental as it extends previous attention-based RNN methods.

The paper tackles speech enhancement by proposing a new deep learning structure that integrates a full attention mechanism into a bidirectional sequence-to-sequence model, achieving better performance in speech quality (PESQ) compared to existing methods like OM-LSA, CNN-LSTM, T-GSA, and a unidirectional baseline.

As the cornerstone of other important technologies, such as speech recognition and speech synthesis, speech enhancement is a critical area in audio signal processing. In this paper, a new deep learning structure for speech enhancement is demonstrated. The model introduces a "full" attention mechanism to a bidirectional sequence-to-sequence method to make use of latent information after each focal frame. This is an extension of the previous attention-based RNN method. The proposed bidirectional attention-based architecture achieves better performance in terms of speech quality (PESQ), compared with OM-LSA, CNN-LSTM, T-GSA and the unidirectional attention-based LSTM baseline.

View on arXiv PDF

Similar