Sound Event Detection and Separation: a Benchmark on Desed Synthetic Soundscapes
This work addresses challenges in audio processing for applications like surveillance or smart devices, but it is incremental as it benchmarks existing methods on new synthetic data.
The paper tackles the problem of sound event detection (SED) by benchmarking state-of-the-art systems using synthetic soundscapes, showing that time localization, reverberation, and non-target events degrade performance, with sound separation suggested as a solution.
We propose a benchmark of state-of-the-art sound event detection systems (SED). We designed synthetic evaluation sets to focus on specific sound event detection challenges. We analyze the performance of the submissions to DCASE 2021 task 4 depending on time related modifications (time position of an event and length of clips) and we study the impact of non-target sound events and reverberation. We show that the localization in time of sound events is still a problem for SED systems. We also show that reverberation and non-target sound events are severely degrading the performance of the SED systems. In the latter case, sound separation seems like a promising solution.