An evaluation framework for event detection using a morphological model of acoustic scenes
This work addresses the need for better evaluation frameworks in acoustic event detection, particularly for assessing system robustness, but it is incremental as it builds on existing challenge data and methods.
The paper tackles the problem of evaluating acoustic event detection systems by introducing a morphological model of acoustic scenes that abstracts temporal structures, enabling explicit control of key aspects to isolate their impact on performance. Results using IEEE DCASE Challenge systems show the model successfully builds datasets for evaluating robustness to new listening conditions and background sounds.
This paper introduces a model of environmental acoustic scenes which adopts a morphological approach by ab-stracting temporal structures of acoustic scenes. To demonstrate its potential, this model is employed to evaluate the performance of a large set of acoustic events detection systems. This model allows us to explicitly control key morphological aspects of the acoustic scene and isolate their impact on the performance of the system under evaluation. Thus, more information can be gained on the behavior of evaluated systems, providing guidance for further improvements. The proposed model is validated using submitted systems from the IEEE DCASE Challenge; results indicate that the proposed scheme is able to successfully build datasets useful for evaluating some aspects the performance of event detection systems, more particularly their robustness to new listening conditions and the increasing level of background sounds.