SDOct 16, 2023
LocSelect: Target Speaker Localization with an Auditory Selective Hearing MechanismYu Chen, Xinyuan Qian, Zexu Pan et al.
The prevailing noise-resistant and reverberation-resistant localization algorithms primarily emphasize separating and providing directional output for each speaker in multi-speaker scenarios, without association with the identity of speakers. In this paper, we present a target speaker localization algorithm with a selective hearing mechanism. Given a reference speech of the target speaker, we first produce a speaker-dependent spectrogram mask to eliminate interfering speakers' speech. Subsequently, a Long short-term memory (LSTM) network is employed to extract the target speaker's location from the filtered spectrogram. Experiments validate the superiority of our proposed method over the existing algorithms for different scale invariant signal-to-noise ratios (SNR) conditions. Specifically, at SNR = -10 dB, our proposed network LocSelect achieves a mean absolute error (MAE) of 3.55 and an accuracy (ACC) of 87.40%.
SDJan 26
Analytic Incremental Learning For Sound Source Localization With Imbalance RectificationZexia Fan, Yu Chen, Qiquan Zhang et al.
Sound source localization (SSL) demonstrates remarkable results in controlled settings but struggles in real-world deployment due to dual imbalance challenges: intra-task imbalance arising from long-tailed direction-of-arrival (DoA) distributions, and inter-task imbalance induced by cross-task skews and overlaps. These often lead to catastrophic forgetting, significantly degrading the localization accuracy. To mitigate these issues, we propose a unified framework with two key innovations. Specifically, we design a GCC-PHAT-based data augmentation (GDA) method that leverages peak characteristics to alleviate intra-task distribution skews. We also propose an Analytic dynamic imbalance rectifier (ADIR) with task-adaption regularization, which enables analytic updates that adapt to inter-task dynamics. On the SSLR benchmark, our proposal achieves state-of-the-art (SoTA) results of 89.0% accuracy, 5.3° mean absolute error, and 1.6 backward transfer, demonstrating robustness to evolving imbalances without exemplar storage.
ASJun 20, 2019
A Signal Subspace Rotation Method for Localization of Multiple Wideband Sound SourcesKainan Chen, Wenyu Jin, Bharadwaj Desikan
In this paper, the problem of extending narrowband multichannel sound source localization algorithms to the wideband case is addressed. The DOA estimation of narrowband algorithms is based on the estimate of inter-channel phase differences (IPD) between microphones of the sound sources. A new method for wideband sound source DOA estimation based on signal subspace rotation is present. The proposed algorithm normalizes the narrowband signal statistics by rotating the estimated signal subspace to the wideband counterpart in the eigenvector domain. Then the wideband DOA estimate can be obtained by estimating the normalized IPD from these wideband signal statistics. In addition to requiring less computational complexity compared to repeating the narrowband algorithms for all relevant frequencies of wideband signals, the proposed method also does not require any additional prior knowledge. The experimental results demonstrate the efficacy and the robustness of the proposed method.