ASAILGSDSPJul 22, 2021

What Makes Sound Event Localization and Detection Difficult? Insights from Error Analysis

arXiv:2107.10469v28 citations
AI Analysis

This work provides insights into the specific difficulties of SELD, an emerging task combining sound event detection and direction-of-arrival estimation, but it is incremental as it builds on prior error analysis without introducing new methods.

The paper analyzed the main challenges in sound event localization and detection (SELD), identifying polyphony as the primary difficulty due to issues in detecting all sound events, based on error analysis of systems that ranked second in the DCASE SELD Challenge in 2020 and 2021.

Sound event localization and detection (SELD) is an emerging research topic that aims to unify the tasks of sound event detection and direction-of-arrival estimation. As a result, SELD inherits the challenges of both tasks, such as noise, reverberation, interference, polyphony, and non-stationarity of sound sources. Furthermore, SELD often faces an additional challenge of assigning correct correspondences between the detected sound classes and directions of arrival to multiple overlapping sound events. Previous studies have shown that unknown interferences in reverberant environments often cause major degradation in the performance of SELD systems. To further understand the challenges of the SELD task, we performed a detailed error analysis on two of our SELD systems, which both ranked second in the team category of DCASE SELD Challenge, one in 2020 and one in 2021. Experimental results indicate polyphony as the main challenge in SELD, due to the difficulty in detecting all sound events of interest. In addition, the SELD systems tend to make fewer errors for the polyphonic scenario that is dominant in the training set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes