SDASDec 27, 2017

Eventness: Object Detection on Spectrograms for Temporal Localization of Audio Events

arXiv:1712.09668v21 citations
Originality Synthesis-oriented
AI Analysis

This work addresses audio event detection for the audio signal processing community by adapting a visual object detection model, though it is incremental as it applies an existing method to a new domain.

The paper tackles audio event detection by introducing the concept of Eventness, analogous to Objectness in vision, to detect time-frequency patterns in spectrograms as visual objects, achieving comparable results to state-of-the-art baselines and improved robustness on minority events.

In this paper, we introduce the concept of Eventness for audio event detection, which can, in part, be thought of as an analogue to Objectness from computer vision. The key observation behind the eventness concept is that audio events reveal themselves as 2-dimensional time-frequency patterns with specific textures and geometric structures in spectrograms. These time-frequency patterns can then be viewed analogously to objects occurring in natural images (with the exception that scaling and rotation invariance properties do not apply). With this key observation in mind, we pose the problem of detecting monophonic or polyphonic audio events as an equivalent visual object(s) detection problem under partial occlusion and clutter in spectrograms. We adapt a state-of-the-art visual object detection model to evaluate the audio event detection task on publicly available datasets. The proposed network has comparable results with a state-of-the-art baseline and is more robust on minority events. Provided large-scale datasets, we hope that our proposed conceptual model of eventness will be beneficial to the audio signal processing community towards improving performance of audio event detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes