ASAISDMay 17, 2023

BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions

arXiv:2305.09994v122 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for listeners in noisy environments by integrating brain activity, though it is incremental as it builds on existing time-domain networks with a novel fusion module.

The paper tackles the challenge of extracting a target speaker from monaural speech mixtures in multi-talker conditions without prior information, by proposing a brain-assisted speech enhancement network (BASEN) that uses EEG signals from the listener, and it outperforms the state-of-the-art method in several evaluation metrics.

Time-domain single-channel speech enhancement (SE) still remains challenging to extract the target speaker without any prior information on multi-talker conditions. It has been shown via auditory attention decoding that the brain activity of the listener contains the auditory information of the attended speaker. In this paper, we thus propose a novel time-domain brain-assisted SE network (BASEN) incorporating electroencephalography (EEG) signals recorded from the listener for extracting the target speaker from monaural speech mixtures. The proposed BASEN is based on the fully-convolutional time-domain audio separation network. In order to fully leverage the complementary information contained in the EEG signals, we further propose a convolutional multi-layer cross attention module to fuse the dual-branch features. Experimental results on a public dataset show that the proposed model outperforms the state-of-the-art method in several evaluation metrics. The reproducible code is available at https://github.com/jzhangU/Basen.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes