CVAug 19, 2023

Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling

arXiv:2308.09946v27 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work improves action localization for video analysis, but it is incremental as it builds on existing MIL-based methods by focusing on temporal semantics.

The paper tackles weakly-supervised action localization in videos by proposing a hierarchically-structured latent attention model to address temporal feature variations, achieving state-of-the-art results on THUMOS-14 and ActivityNet-v1.3 datasets with performance comparable to fully-supervised methods.

Weakly-supervised action localization aims to recognize and localize action instancese in untrimmed videos with only video-level labels. Most existing models rely on multiple instance learning(MIL), where the predictions of unlabeled instances are supervised by classifying labeled bags. The MIL-based methods are relatively well studied with cogent performance achieved on classification but not on localization. Generally, they locate temporal regions by the video-level classification but overlook the temporal variations of feature semantics. To address this problem, we propose a novel attention-based hierarchically-structured latent model to learn the temporal variations of feature semantics. Specifically, our model entails two components, the first is an unsupervised change-points detection module that detects change-points by learning the latent representations of video features in a temporal hierarchy based on their rates of change, and the second is an attention-based classification model that selects the change-points of the foreground as the boundaries. To evaluate the effectiveness of our model, we conduct extensive experiments on two benchmark datasets, THUMOS-14 and ActivityNet-v1.3. The experiments show that our method outperforms current state-of-the-art methods, and even achieves comparable performance with fully-supervised methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes