CVAISep 12, 2023

JOADAA: joint online action detection and action anticipation

arXiv:2309.06130v113 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses performance limitations in video action analysis for applications like surveillance and robotics by integrating two related tasks, though it is incremental as it builds on existing methods.

The paper tackles the problem of incomplete temporal information in action anticipation and online action detection by proposing JOADAA, a unified model that fuses both tasks to leverage past, present, and future dependencies, achieving state-of-the-art results on THUMOS'14, CHARADES, and Multi-THUMOS datasets.

Action anticipation involves forecasting future actions by connecting past events to future ones. However, this reasoning ignores the real-life hierarchy of events which is considered to be composed of three main parts: past, present, and future. We argue that considering these three main parts and their dependencies could improve performance. On the other hand, online action detection is the task of predicting actions in a streaming manner. In this case, one has access only to the past and present information. Therefore, in online action detection (OAD) the existing approaches miss semantics or future information which limits their performance. To sum up, for both of these tasks, the complete set of knowledge (past-present-future) is missing, which makes it challenging to infer action dependencies, therefore having low performances. To address this limitation, we propose to fuse both tasks into a single uniform architecture. By combining action anticipation and online action detection, our approach can cover the missing dependencies of future information in online action detection. This method referred to as JOADAA, presents a uniform model that jointly performs action anticipation and online action detection. We validate our proposed model on three challenging datasets: THUMOS'14, which is a sparsely annotated dataset with one action per time step, CHARADES, and Multi-THUMOS, two densely annotated datasets with more complex scenarios. JOADAA achieves SOTA results on these benchmarks for both tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes