CVLGOct 20, 2022

Rethinking Learning Approaches for Long-Term Action Anticipation

arXiv:2210.11566v142 citationsh-index: 59
Originality Incremental advance
AI Analysis

This addresses the problem of predicting future actions in videos for applications like robotics and surveillance, but it is incremental as it builds on existing anticipation methods.

The paper tackles long-term action anticipation in videos by introducing ANTICIPATR, a transformer-based model that uses segment-level and video-level representations to predict future actions, achieving effective results on multiple datasets.

Action anticipation involves predicting future actions having observed the initial portion of a video. Typically, the observed video is processed as a whole to obtain a video-level representation of the ongoing activity in the video, which is then used for future prediction. We introduce ANTICIPATR which performs long-term action anticipation leveraging segment-level representations learned using individual segments from different activities, in addition to a video-level representation. We propose a two-stage learning approach to train a novel transformer-based model that uses these two types of representations to directly predict a set of future action instances over any given anticipation duration. Results on Breakfast, 50Salads, Epic-Kitchens-55, and EGTEA Gaze+ datasets demonstrate the effectiveness of our approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes