ASLGSDSPFeb 20, 2023

A DNN based Normalized Time-frequency Weighted Criterion for Robust Wideband DoA Estimation

arXiv:2302.10147v17 citationsh-index: 67
Originality Incremental advance
AI Analysis

This work addresses robust speech source localization for applications like audio processing, but it is incremental as it builds on existing DNN methods with specific enhancements.

The authors tackled the problem of improving direction of arrival (DoA) estimation for speech source localization in noisy environments by proposing a DNN-based normalized time-frequency weighted criterion, which outperformed popular DNN-based and subspace methods in noisy and reverberant conditions.

Deep neural networks (DNNs) have greatly benefited direction of arrival (DoA) estimation methods for speech source localization in noisy environments. However, their localization accuracy is still far from satisfactory due to the vulnerability to nonspeech interference. To improve the robustness against interference, we propose a DNN based normalized time-frequency (T-F) weighted criterion which minimizes the distance between the candidate steering vectors and the filtered snapshots in the T-F domain. Our method requires no eigendecomposition and uses a simple normalization to prevent the optimization objective from being misled by noisy filtered snapshots. We also study different designs of T-F weights guided by a DNN. We find that duplicating the Hadamard product of speech ratio masks is highly effective and better than other techniques such as direct masking and taking the mean in the proposed approach. However, the best-performing design of T-F weights is criterion-dependent in general. Experiments show that the proposed method outperforms popular DNN based DoA estimation methods including widely used subspace methods in noisy and reverberant environments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes