ASCLSDJun 9, 2023

Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement

arXiv:2306.05861v19 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for audio processing applications, but it is incremental as it builds on existing encoder-decoder and attention mechanisms.

The paper tackled the neglect of channel and spatial attention in speech enhancement and inefficient encoder-decoder inputs by proposing DPCFCS-Net, which outperforms existing techniques on the VCTK+DEMAND dataset.

Current speech enhancement (SE) research has largely neglected channel attention and spatial attention, and encoder-decoder architecture-based networks have not adequately considered how to provide efficient inputs to the intermediate enhancement layer. To address these issues, this paper proposes a time-frequency (T-F) domain SE network (DPCFCS-Net) that incorporates improved densely connected blocks, dual-path modules, convolution-augmented transformers (conformers), channel attention, and spatial attention. Compared with previous models, our proposed model has a more efficient encoder-decoder and can learn comprehensive features. Experimental results on the VCTK+DEMAND dataset demonstrate that our method outperforms existing techniques in SE performance. Furthermore, the improved densely connected block and two dimensions attention module developed in this work are highly adaptable and easily integrated into existing networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes