AS LG SPDec 17, 2023

Attention-Driven Multichannel Speech Enhancement in Moving Sound Source Scenarios

Yuzhu Wang, Archontis Politis, Tuomas Virtanen

arXiv:2312.10756v11.29 citationsh-index: 25ICASSP

Originality Incremental advance

AI Analysis

This work addresses a practical limitation in multichannel speech enhancement for real-world scenarios like hearing aids or teleconferencing, though it appears incremental as it builds on existing attention-based methods.

The paper tackles speech enhancement for moving sound sources, which conventional methods assume are stationary, by developing attention-driven spatial filtering techniques that estimate time-varying spatial covariance matrices or directly estimate filters. Experimental results on simulated moving speakers in reverberant environments with real noise show these approaches consistently outperform conventional methods in both static and dynamic settings.

Current multichannel speech enhancement algorithms typically assume a stationary sound source, a common mismatch with reality that limits their performance in real-world scenarios. This paper focuses on attention-driven spatial filtering techniques designed for dynamic settings. Specifically, we study the application of linear and nonlinear attention-based methods for estimating time-varying spatial covariance matrices used to design the filters. We also investigate the direct estimation of spatial filters by attention-based methods without explicitly estimating spatial statistics. The clean speech clips from WSJ0 are employed for simulating speech signals of moving speakers in a reverberant environment. The experimental dataset is built by mixing the simulated speech signals with multichannel real noise from CHiME-3. Evaluation results show that the attention-driven approaches are robust and consistently outperform conventional spatial filtering approaches in both static and dynamic sound environments.

View on arXiv PDF

Similar