SDAPSep 30, 2016

Rectified binaural ratio: A complex T-distributed feature for robust sound localization

arXiv:1609.09743v14 citations
Originality Highly original
AI Analysis

This work addresses robust sound localization for audio processing applications, representing an incremental improvement with a novel statistical formulation.

The paper tackled the problem of robust sound source localization in adverse noise conditions by introducing the rectified binaural ratio as a new feature, which follows a complex t-distribution and enables principled aggregation, resulting in improved robustness in experiments on corrupted signals.

Most existing methods in binaural sound source localization rely on some kind of aggregation of phase-and level-difference cues in the time-frequency plane. While different ag-gregation schemes exist, they are often heuristic and suffer in adverse noise conditions. In this paper, we introduce the rectified binaural ratio as a new feature for sound source local-ization. We show that for Gaussian-process point source signals corrupted by stationary Gaussian noise, this ratio follows a complex t-distribution with explicit parameters. This new formulation provides a principled and statistically sound way to aggregate binaural features in the presence of noise. We subsequently derive two simple and efficient methods for robust relative transfer function and time-delay estimation. Experiments on heavily corrupted simulated and speech signals demonstrate the robustness of the proposed scheme.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes