CVMay 21

Improving Viewpoint-Invariance and Temporal Consistency for Action Detection

arXiv:2605.2269514.1
Predicted impact top 86% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in action detection, this work improves robustness to viewpoint changes and temporal modeling, outperforming existing methods.

The paper tackles viewpoint invariance and temporal consistency in action detection from untrimmed videos, achieving state-of-the-art results on PKU-MMD and BABEL benchmarks.

Viewpoint change invariance and action temporal consistency are critical aspects for the effective deployment of human action detection of untrimmed videos. Existing appearance-based video detection methods often struggle with limited viewpoint diversity during training, while motion-based detection approaches frequently fail to model fine-grained temporal relationships across consecutive motion windows. This paper introduces a novel two-stage action detection approach designed to improve both view-invariance and global temporal coherence properties. In the first stage, we extract motion features from augmented virtual viewpoints, solely used at training. Then, the second stage introduces a new view-invariant, multi-scale temporal encoder based on selective state-space sequence modelling to aggregate information across viewpoints and time scales. Experiments on PKU-MMD and BABEL benchmarks demonstrate that this approach significantly outperforms state-of-the-art methods in all considered splits. Code and trained models are available at: https://icb-vision-ai.github.io/HydraView-TAD

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes