Towards joint sound scene and polyphonic sound event recognition
This work addresses the integration of sound scene and event recognition for computational sound analysis, but it is incremental as it builds on existing tasks with a new dataset and method.
The paper tackled the problem of separately handling Acoustic Scene Classification (ASC) and Sound Event Detection (SED) by introducing a new dataset with both labels and a joint method, resulting in more efficient learning and robust SED results in a skewed dataset, though improvements are still needed for SED.
Acoustic Scene Classification (ASC) and Sound Event Detection (SED) are two separate tasks in the field of computational sound scene analysis. In this work, we present a new dataset with both sound scene and sound event labels and use this to demonstrate a novel method for jointly classifying sound scenes and recognizing sound events. We show that by taking a joint approach, learning is more efficient and whilst improvements are still needed for sound event detection, SED results are robust in a dataset where the sample distribution is skewed towards sound scenes.