SDLGASJun 7, 2021

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

arXiv:2106.03903v129 citations
Originality Incremental advance
AI Analysis

This work addresses sound event localization for applications like acoustic monitoring, offering a novel transformer approach with uncertainty estimation, though it is incremental as it adapts existing transformer architectures to this domain.

The paper tackles sound event localization by introducing a transformer-based framework that captures temporal dependencies via self-attention and represents positions as multivariate Gaussians for uncertainty estimation. It outperforms state-of-the-art methods on three datasets with statistically significant improvements in localization error and detection accuracy.

Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e.g. a microphone array). Recent advances in this domain most prominently focused on utilizing deep recurrent neural networks. Inspired by the success of transformer architectures as a suitable alternative to classical recurrent neural networks, this paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms. Additionally, the estimated sound event positions are represented as multivariate Gaussian variables, yielding an additional notion of uncertainty, which many previously proposed deep learning-based systems designed for this application do not provide. The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy. It outperforms all competing systems on all datasets with statistical significant differences in performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes