A simple model for detection of rare sound events
This work addresses rare sound event detection, which is important for applications like audio surveillance or monitoring, but it is incremental as it builds on existing methods with a hybrid approach.
The authors tackled the problem of detecting rare sound events by proposing a simple recurrent model that combines utterance-level and frame-level losses with a shared representation and attention mechanism, achieving competitive performance on the DCASE 2017 challenge Task 2.
We propose a simple recurrent model for detecting rare sound events, when the time boundaries of events are available for training. Our model optimizes the combination of an utterance-level loss, which classifies whether an event occurs in an utterance, and a frame-level loss, which classifies whether each frame corresponds to the event when it does occur. The two losses make use of a shared vectorial representation the event, and are connected by an attention mechanism. We demonstrate our model on Task 2 of the DCASE 2017 challenge, and achieve competitive performance.