A Framework for the Robust Evaluation of Sound Event Detection
This work addresses the need for more reliable and application-tunable evaluation metrics in sound event detection, particularly for researchers and practitioners dealing with polyphonic audio data, though it is incremental as it builds on existing evaluation concepts.
The authors tackled the problem of evaluating polyphonic sound event detection systems by proposing a new framework that overcomes limitations of conventional metrics, introducing a more robust event definition and a polyphonic sound detection score for system comparison independent of operating points, and demonstrated benefits by re-evaluating systems from DCASE 2019 Task 4.
This work defines a new framework for performance evaluation of polyphonic sound event detection (SED) systems, which overcomes the limitations of the conventional collar-based event decisions, event F-scores and event error rates. The proposed framework introduces a definition of event detection that is more robust against labelling subjectivity. It also resorts to polyphonic receiver operating characteristic (ROC) curves to deliver more global insight into system performance than F1-scores, and proposes a reduction of these curves into a single polyphonic sound detection score (PSDS), which allows system comparison independently from operating points (OPs). The presented method also delivers better insight into data biases and classification stability across sound classes. Furthermore, it can be tuned to varying applications in order to match a variety of user experience requirements. The benefits of the proposed approach are demonstrated by re-evaluating the baseline and two of the top-performing systems from DCASE 2019 Task 4.