AS LG SD SPNov 22, 2022

Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation

arXiv:2211.12632v17.311 citationsh-index: 56

Originality Incremental advance

AI Analysis

This work addresses speech processing for applications like ASR by introducing an incremental improvement in attention mechanisms for complex-valued networks.

The paper tackled speech dereverberation by proposing a complex-valued time-frequency self-attention module that models inter-dependencies between real and imaginary features, resulting in improved speech quality and automatic speech recognition performance compared to earlier methods.

Several speech processing systems have demonstrated considerable performance improvements when deep complex neural networks (DCNN) are coupled with self-attention (SA) networks. However, the majority of DCNN-based studies on speech dereverberation that employ self-attention do not explicitly account for the inter-dependencies between real and imaginary features when computing attention. In this study, we propose a complex-valued T-F attention (TFA) module that models spectral and temporal dependencies by computing two-dimensional attention maps across time and frequency dimensions. We validate the effectiveness of our proposed complex-valued TFA module with the deep complex convolutional recurrent network (DCCRN) using the REVERB challenge corpus. Experimental findings indicate that integrating our complex-TFA module with DCCRN improves overall speech quality and performance of back-end speech applications, such as automatic speech recognition, compared to earlier approaches for self-attention.

View on arXiv PDF

Similar