LMFCA-Net: A Lightweight Model for Multi-Channel Speech Enhancement with Efficient Narrow-Band and Cross-Band Attention
This work addresses the need for efficient speech enhancement on terminal devices, offering a practical solution with reduced computational costs, though it appears incremental as it builds on existing attention-based approaches.
The paper tackled the problem of high computational demands in multi-channel speech enhancement methods by proposing LMFCA-Net, a lightweight network using decoupled fully connected attention mechanisms, which achieved comparable performance to state-of-the-art methods while significantly reducing complexity and latency.
Deep learning based end-to-end multi-channel speech enhancement methods have achieved impressive performance by leveraging sub-band, cross-band, and spatial information. However, these methods often demand substantial computational resources, limiting their practicality on terminal devices. This paper presents a lightweight multi-channel speech enhancement network with decoupled fully connected attention (LMFCA-Net). The proposed LMFCA-Net introduces time-axis decoupled fully-connected attention (T-FCA) and frequency-axis decoupled fully-connected attention (F-FCA) mechanisms to effectively capture long-range narrow-band and cross-band information without recurrent units. Experimental results show that LMFCA-Net performs comparably to state-of-the-art methods while significantly reducing computational complexity and latency, making it a promising solution for practical applications.