Rectified binaural ratio: A complex T-distributed feature for robust sound localization
This work addresses robust sound localization for audio processing applications, representing an incremental improvement with a novel statistical formulation.
The paper tackled the problem of robust sound source localization in adverse noise conditions by introducing the rectified binaural ratio as a new feature, which follows a complex t-distribution and enables principled aggregation, resulting in improved robustness in experiments on corrupted signals.
Most existing methods in binaural sound source localization rely on some kind of aggregation of phase-and level-difference cues in the time-frequency plane. While different ag-gregation schemes exist, they are often heuristic and suffer in adverse noise conditions. In this paper, we introduce the rectified binaural ratio as a new feature for sound source local-ization. We show that for Gaussian-process point source signals corrupted by stationary Gaussian noise, this ratio follows a complex t-distribution with explicit parameters. This new formulation provides a principled and statistically sound way to aggregate binaural features in the presence of noise. We subsequently derive two simple and efficient methods for robust relative transfer function and time-delay estimation. Experiments on heavily corrupted simulated and speech signals demonstrate the robustness of the proposed scheme.