ASSDSPApr 20

Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation

arXiv:2504.0864419.56 citationsh-index: 6
AI Analysis

This work addresses the lack of input features for distance estimation in 3D SELD, providing a practical improvement for audio processing applications.

The paper introduces two novel reverberation-based feature formats for distance estimation in 3D sound event localization and detection (SELD), achieving state-of-the-art performance on the STARSS23 dataset.

Sound event localization and detection (SELD) involves predicting active sound event classes over time while estimating their positions. The localization subtask in SELD is usually treated as a direction of arrival estimation problem, ignoring source distance. Only recently, SELD was extended to 3D by incorporating distance estimation, enabling the prediction of sound event positions in 3D space (3D SELD). However, existing methods lack input features specifically designed for distance estimation. We address this gap by introducing two novel reverberation-based feature formats: one using the direct-to-reverberant ratio (DRR) and another leveraging signal autocorrelation to capture early reflections. We extensively evaluate and benchmark these features on the STARSS23 dataset, combining them with established SELD features for sound event detection (SED) and direction-of-arrival estimation (DOAE), and testing across different network architectures. Our proposed features, applicable to both FOA and MIC formats, achieve state-of-the-art distance estimation, enhancing overall 3D SELD performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes