ASLGSDAug 28, 2024

wav2pos: Sound Source Localization using Masked Autoencoders

arXiv:2408.15771v16 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This addresses sound source localization for distributed microphone arrays, offering flexibility with arbitrary microphone counts and missing data, but it is incremental as it builds on existing masked autoencoder and learning-based methods.

The paper tackles 3D sound source localization for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem using a multi-modal masked autoencoder, achieving competitive performance on simulated and real-world recordings of music and speech in indoor environments.

We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance compared to both classical and other learning based localization methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes