CVNov 20, 2020

Action Duration Prediction for Segment-Level Alignment of Weakly-Labeled Videos

Reza Ghoddoosian, Saif Sayed, Vassilis Athitsos

arXiv:2011.10190v14.27 citationsHas Code

Originality Incremental advance

AI Analysis

This work provides an incremental improvement in weakly-supervised action alignment for researchers and practitioners working with video analysis.

This paper addresses weakly-supervised action alignment in videos, where only the ordered sequence of video-level actions is provided for training. The authors propose a Duration Network to predict the remaining duration of an action at any given point, along with a Segment-Level Beam Search for efficient alignment. The method demonstrates more robust alignments for long videos and achieves state-of-the-art results in certain cases on the Breakfast and Hollywood Extended datasets.

This paper focuses on weakly-supervised action alignment, where only the ordered sequence of video-level actions is available for training. We propose a novel Duration Network, which captures a short temporal window of the video and learns to predict the remaining duration of a given action at any point in time with a level of granularity based on the type of that action. Further, we introduce a Segment-Level Beam Search to obtain the best alignment, that maximizes our posterior probability. Segment-Level Beam Search efficiently aligns actions by considering only a selected set of frames that have more confident predictions. The experimental results show that our alignments for long videos are more robust than existing models. Moreover, the proposed method achieves state of the art results in certain cases on the popular Breakfast and Hollywood Extended datasets.

View on arXiv PDF Code

Similar