Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach
This addresses the challenge of generalizing acoustic mapping across diverse acoustic setups and array configurations for sound localization systems, though it appears incremental as it bridges existing methods.
The paper tackles the problem of direction of arrival estimation by introducing the Latent Acoustic Mapping model, a self-supervised framework that achieves comparable or superior localization performance to existing supervised methods on benchmarks like LOCATA and STARSS.
Acoustic mapping techniques have long been used in spatial audio processing for direction of arrival estimation (DoAE). Traditional beamforming methods for acoustic mapping, while interpretable, often rely on iterative solvers that can be computationally intensive and sensitive to acoustic variability. On the other hand, recent supervised deep learning approaches offer feedforward speed and robustness but require large labeled datasets and lack interpretability. Despite their strengths, both methods struggle to consistently generalize across diverse acoustic setups and array configurations, limiting their broader applicability. We introduce the Latent Acoustic Mapping (LAM) model, a self-supervised framework that bridges the interpretability of traditional methods with the adaptability and efficiency of deep learning methods. LAM generates high-resolution acoustic maps, adapts to varying acoustic conditions, and operates efficiently across different microphone arrays. We assess its robustness on DoAE using the LOCATA and STARSS benchmarks. LAM achieves comparable or superior localization performance to existing supervised methods. Additionally, we show that LAM's acoustic maps can serve as effective features for supervised models, further enhancing DoAE accuracy and underscoring its potential to advance adaptive, high-performance sound localization systems.