Similarity R-C3D for Few-shot Temporal Activity Detection
This addresses the challenge of rare event detection in video analysis, which is crucial for applications like surveillance and content indexing, but the approach appears incremental as it builds on existing detection methods with a similarity-based adaptation.
The paper tackles the problem of detecting rare activities in untrimmed videos with few labeled examples by proposing a novel framework for few-shot temporal activity detection, achieving state-of-the-art performance on three large-scale benchmarks (THUMOS14, ActivityNet1.2, and ActivityNet1.3).
Many activities of interest are rare events, with only a few labeled examples available. Therefore models for temporal activity detection which are able to learn from a few examples are desirable. In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video. Our model is end-to-end trainable and can benefit from more few-shot examples. At test time, each proposal is assigned the label of the few-shot activity class corresponding to the maximum similarity score. Our Similarity R-C3D method outperforms previous work on three large-scale benchmarks for temporal activity detection (THUMOS14, ActivityNet1.2, and ActivityNet1.3 datasets) in the few-shot setting. Our code will be made available.