CVFeb 8, 2022

Untrimmed Action Anticipation

arXiv:2202.04132v110 citations
Originality Synthesis-oriented
AI Analysis

This addresses a limitation in real-world applications of action anticipation for egocentric video analysis, though it is incremental as it adapts an existing task to untrimmed inputs.

The paper tackles the problem of action anticipation in untrimmed egocentric videos, where the exact start time of future actions is unknown, and finds that current trimmed-action models perform poorly on this more realistic scenario.

Egocentric action anticipation consists in predicting a future action the camera wearer will perform from egocentric video. While the task has recently attracted the attention of the research community, current approaches assume that the input videos are "trimmed", meaning that a short video sequence is sampled a fixed time before the beginning of the action. We argue that, despite the recent advances in the field, trimmed action anticipation has a limited applicability in real-world scenarios where it is important to deal with "untrimmed" video inputs and it cannot be assumed that the exact moment in which the action will begin is known at test time. To overcome such limitations, we propose an untrimmed action anticipation task, which, similarly to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before the actions actually take place. We design an evaluation procedure for methods designed to address this novel task, and compare several baselines on the EPIC-KITCHENS-100 dataset. Experiments show that the performance of current models designed for trimmed action anticipation is very limited and more research on this task is required.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes