CVLGApr 5, 2019

Fast Weakly Supervised Action Segmentation Using Mutual Consistency

arXiv:1904.03116v464 citations
Originality Highly original
AI Analysis

This work addresses the high cost of full video annotation for action segmentation, offering a faster solution for video analysis applications.

The paper tackles the problem of weakly supervised action segmentation in videos by proposing a two-branch neural network with a mutual consistency loss, achieving state-of-the-art accuracy while being 14 times faster in training and 20 times faster in inference.

Action segmentation is the task of predicting the actions for each frame of a video. As obtaining the full annotation of videos for action segmentation is expensive, weakly supervised approaches that can learn only from transcripts are appealing. In this paper, we propose a novel end-to-end approach for weakly supervised action segmentation based on a two-branch neural network. The two branches of our network predict two redundant but different representations for action segmentation and we propose a novel mutual consistency (MuCon) loss that enforces the consistency of the two redundant representations. Using the MuCon loss together with a loss for transcript prediction, our proposed approach achieves the accuracy of state-of-the-art approaches while being $14$ times faster to train and $20$ times faster during inference. The MuCon loss proves beneficial even in the fully supervised setting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes