LG SD AS MLMay 17, 2019

Weakly-Supervised Temporal Localization via Occurrence Count Learning

Julien Schroeter, Kirill Sidorov, David Marshall

arXiv:1905.07293v15.412 citations

Originality Incremental advance

AI Analysis

This addresses the problem of reducing annotation burden for researchers and practitioners in fields like audio and image analysis, though it is incremental as it builds on weakly-supervised frameworks.

The paper tackles the problem of temporal event detection and localization by training deep neural networks using only counts of event occurrences as labels, eliminating the need for precise annotations. It achieves performance comparable to fully-supervised state-of-the-art methods in experiments like drum hit and piano onset detection in audio and digit detection in images.

We propose a novel model for temporal detection and localization which allows the training of deep neural networks using only counts of event occurrences as training labels. This powerful weakly-supervised framework alleviates the burden of the imprecise and time-consuming process of annotating event locations in temporal data. Unlike existing methods, in which localization is explicitly achieved by design, our model learns localization implicitly as a byproduct of learning to count instances. This unique feature is a direct consequence of the model's theoretical properties. We validate the effectiveness of our approach in a number of experiments (drum hit and piano onset detection in audio, digit detection in images) and demonstrate performance comparable to that of fully-supervised state-of-the-art methods, despite much weaker training requirements.

View on arXiv PDF

Similar