CVJun 2, 2017

Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints

arXiv:1706.00699v240 citations
AI Analysis

This addresses the costly annotation problem in video analysis for researchers and practitioners by enabling easier data collection from meta-tags, though it is incremental in reducing supervision requirements.

The paper tackles the problem of weakly supervised action segmentation in videos by using only unordered action sets as supervision, eliminating the need for ordered sequences or full annotations. It achieves competitive results on three datasets despite significantly reduced supervision.

Action detection and temporal segmentation of actions in videos are topics of increasing interest. While fully supervised systems have gained much attention lately, full annotation of each action within the video is costly and impractical for large amounts of video data. Thus, weakly supervised action detection and temporal segmentation methods are of great importance. While most works in this area assume an ordered sequence of occurring actions to be given, our approach only uses a set of actions. Such action sets provide much less supervision since neither action ordering nor the number of action occurrences are known. In exchange, they can be easily obtained, for instance, from meta-tags, while ordered sequences still require human annotation. We introduce a system that automatically learns to temporally segment and label actions in a video, where the only supervision that is used are action sets. An evaluation on three datasets shows that our method still achieves good results although the amount of supervision is significantly smaller than for other related methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes