CVApr 28, 2022

The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction

arXiv:2204.13340v218 citationsh-index: 44
Originality Incremental advance
AI Analysis

This work addresses the problem of predicting ongoing actions early in videos for applications like surveillance or robotics, representing an incremental improvement over existing methods.

The paper tackles early action prediction from partially-observed videos by proposing a bottleneck-based attention model with progressive sampling over scales, achieving state-of-the-art performance across four datasets.

Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video. We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales. Our proposed Temporal Progressive (TemPr) model is composed of multiple attention towers, one for each scale. The predicted action label is based on the collective agreement considering confidences of these towers. Extensive experiments over four video datasets showcase state-of-the-art performance on the task of Early Action Prediction across a range of encoder architectures. We demonstrate the effectiveness and consistency of TemPr through detailed ablations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes