CVApr 6

Group-DINOmics: Incorporating People Dynamics into DINO for Self-supervised Group Activity Feature Learning

arXiv:2604.0446739.6Has Code
AI Analysis

This work addresses the challenge of understanding group activities in videos for applications like surveillance or sports analysis, but it is incremental as it builds on existing self-supervised methods like DINO.

The paper tackles the problem of learning group activity features without annotations by incorporating people dynamics and group-aware pretext tasks into DINO, achieving state-of-the-art performance in group activity retrieval and recognition on public datasets.

This paper proposes Group Activity Feature (GAF) learning without group activity annotations. Unlike prior work, which uses low-level static local features to learn GAFs, we propose leveraging dynamics-aware and group-aware pretext tasks, along with local and global features provided by DINO, for group-dynamics-aware GAF learning. To adapt DINO and GAF learning to local dynamics and global group features, our pretext tasks use person flow estimation and group-relevant object location estimation, respectively. Person flow estimation is used to represent the local motion of each person, which is an important cue for understanding group activities. In contrast, group-relevant object location estimation encourages GAFs to learn scene context (e.g., spatial relations of people and objects) as global features. Comprehensive experiments on public datasets demonstrate the state-of-the-art performance of our method in group activity retrieval and recognition. Our ablation studies verify the effectiveness of each component in our method. Code: https://github.com/tezuka0001/Group-DINOmics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes