SDAIASSep 4, 2024

NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention

arXiv:2409.02489v21 citationsh-index: 9
AI Analysis

This work addresses the cocktail party problem for auditory attention research by providing a neuro-guided extraction method, though it appears incremental as it builds on known EEG-speech correlations.

The paper tackled the problem of extracting a target speaker from monaural speech mixtures using EEG signals as a neuro-guided reference, and the result showed that their proposed model outperformed baseline models on a public dataset.

In the study of auditory attention, it has been revealed that there exists a robust correlation between attended speech and elicited neural responses, measurable through electroencephalography (EEG). Therefore, it is possible to use the attention information available within EEG signals to guide the extraction of the target speaker in a cocktail party computationally. In this paper, we present a neuro-guided speaker extraction model, i.e. NeuroSpex, using the EEG response of the listener as the sole auxiliary reference cue to extract attended speech from monaural speech mixtures. We propose a novel EEG signal encoder that captures the attention information. Additionally, we propose a cross-attention (CA) mechanism to enhance the speech feature representations, generating a speaker extraction mask. Experimental results on a publicly available dataset demonstrate that our proposed model outperforms two baseline models across various evaluation metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes