SD AI ASOct 16, 2023

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li

arXiv:2310.10497v29.59 citationsh-index: 15

Originality Incremental advance

AI Analysis

This addresses the challenge of accurately identifying and locating specific speakers in adverse acoustic conditions, which is incremental as it builds on existing localization methods by incorporating speaker identity.

The paper tackles the problem of localizing a target speaker in noisy and reverberant multi-speaker scenarios by introducing a selective hearing mechanism that uses a reference speech to filter out interfering speakers, achieving a mean absolute error of 3.55 and accuracy of 87.40% at -10 dB SNR.

The prevailing noise-resistant and reverberation-resistant localization algorithms primarily emphasize separating and providing directional output for each speaker in multi-speaker scenarios, without association with the identity of speakers. In this paper, we present a target speaker localization algorithm with a selective hearing mechanism. Given a reference speech of the target speaker, we first produce a speaker-dependent spectrogram mask to eliminate interfering speakers' speech. Subsequently, a Long short-term memory (LSTM) network is employed to extract the target speaker's location from the filtered spectrogram. Experiments validate the superiority of our proposed method over the existing algorithms for different scale invariant signal-to-noise ratios (SNR) conditions. Specifically, at SNR = -10 dB, our proposed network LocSelect achieves a mean absolute error (MAE) of 3.55 and an accuracy (ACC) of 87.40%.

View on arXiv PDF

Similar