CVNov 24, 2024

OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions

arXiv:2411.15729v23 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This addresses robustness issues in action recognition for applications like surveillance or robotics, but is incremental as it builds on existing causal and occlusion-aware methods.

The authors tackled the problem of limited occlusion data in video action recognition by creating OccludeNet, a large-scale dataset with real and synthetic occlusions, and proposed a causal method (CAR) that improved model robustness, though specific accuracy numbers were not provided.

The lack of occlusion data in common action recognition video datasets limits model robustness and hinders consistent performance gains. We build OccludeNet, a large-scale occluded video dataset including both real and synthetic occlusion scenes in different natural settings. OccludeNet includes dynamic occlusion, static occlusion, and multi-view interactive occlusion, addressing gaps in current datasets. Our analysis shows occlusion affects action classes differently: actions with low scene relevance and partial body visibility see larger drops in accuracy. To overcome the limits of existing occlusion-aware methods, we propose a structural causal model for occluded scenes and introduce the Causal Action Recognition (CAR) method, which uses backdoor adjustment and counterfactual reasoning. This approach strengthens key actor information and improves model robustness to occlusion. We hope the challenges of OccludeNet will encourage more study of causal links in occluded scenes and lead to a fresh look at class relations, ultimately leading to lasting performance improvements. Our code and data is availibale at: https://github.com/The-Martyr/OccludeNet-Dataset

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes