CVAug 20, 2020

Learning to Abstract and Predict Human Actions

arXiv:2008.09234v17 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of reliable and detailed long-term action prediction in videos, which is important for applications like surveillance and human-computer interaction, though it appears incremental in advancing hierarchical modeling approaches.

The paper tackles the problem of predicting human actions in videos by modeling their hierarchical structure, proposing a multi-level neural architecture that learns from partial event hierarchies and achieves improved long-term forecasting with detailed abstraction levels.

Human activities are naturally structured as hierarchies unrolled over time. For action prediction, temporal relations in event sequences are widely exploited by current methods while their semantic coherence across different levels of abstraction has not been well explored. In this work we model the hierarchical structure of human activities in videos and demonstrate the power of such structure in action prediction. We propose Hierarchical Encoder-Refresher-Anticipator, a multi-level neural machine that can learn the structure of human activities by observing a partial hierarchy of events and roll-out such structure into a future prediction in multiple levels of abstraction. We also introduce a new coarse-to-fine action annotation on the Breakfast Actions videos to create a comprehensive, consistent, and cleanly structured video hierarchical activity dataset. Through our experiments, we examine and rethink the settings and metrics of activity prediction tasks toward unbiased evaluation of prediction systems, and demonstrate the role of hierarchical modeling toward reliable and detailed long-term action forecasting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes