CVOct 19, 2018

Temporal Action Detection by Joint Identification-Verification

arXiv:1810.08375v1
Originality Incremental advance
AI Analysis

This work addresses temporal action detection for video analysis, presenting an incremental improvement over existing methods.

The paper tackles the problem of temporal action detection in untrimmed videos, where variations within the same action category and similarities between different categories limit performance. The proposed joint Identification-Verification network reduces intra-action variations and enlarges inter-action differences, achieving state-of-the-art results on the THUMOS 2014 dataset.

Temporal action detection aims at not only recognizing action category but also detecting start time and end time for each action instance in an untrimmed video. The key challenge of this task is to accurately classify the action and determine the temporal boundaries of each action instance. In temporal action detection benchmark: THUMOS 2014, large variations exist in the same action category while many similarities exist in different action categories, which always limit the performance of temporal action detection. To address this problem, we propose to use joint Identification-Verification network to reduce the intra-action variations and enlarge inter-action differences. The joint Identification-Verification network is a siamese network based on 3D ConvNets, which can simultaneously predict the action categories and the similarity scores for the input pairs of video proposal segments. Extensive experimental results on the challenging THUMOS 2014 dataset demonstrate the effectiveness of our proposed method compared to the existing state-of-art methods for temporal action detection in untrimmed videos.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes