SDASFeb 16, 2022

Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

arXiv:2202.07841v130 citations
Originality Incremental advance
AI Analysis

This addresses the problem of accurate sound localization in noisy environments for applications like robotics or hearing aids, but it is incremental as it builds on existing DP-RTF concepts with neural network enhancements.

The paper tackles robust binaural sound source localization by learning direct-path relative transfer functions (DP-RTF) with deep neural networks to handle noise and reverberation, achieving effective direction of arrival estimation with good generalization to unseen arrays in experiments on simulated and real-world data.

Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two microphone channels. Though DP-RTF fully encodes the sound spatial cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes to learn DP-RTF with deep neural networks for robust binaural sound source localization. A DP-RTF learning network is designed to regress the binaural sensor signals to a real-valued representation of DP-RTF. It consists of a branched convolutional neural network module to separately extract the inter-channel magnitude and phase patterns, and a convolutional recurrent neural network module for joint feature learning. To better explore the speech spectra to aid the DP-RTF estimation, a monaural speech enhancement network is used to recover the direct-path spectrograms from the noisy ones. The enhanced spectrograms are stacked onto the noisy spectrograms to act as the input of the DP-RTF learning network. We train one unique DP-RTF learning network using many different binaural arrays to enable the generalization of DP-RTF learning across arrays. This way avoids time-consuming training data collection and network retraining for a new array, which is very useful in practical application. Experimental results on both simulated and real-world data show the effectiveness of the proposed method for direction of arrival (DOA) estimation in the noisy and reverberant environment, and a good generalization ability to unseen binaural arrays.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes