CVJul 1, 2024

From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos

Tanqiu Qiao, Ruochen Li, Frederick W. B. Li, Hubert P. H. Shum

arXiv:2407.00917v26.55 citationsh-index: 31

Originality Incremental advance

AI Analysis

This addresses multi-person human-object interaction recognition in videos, which is important for understanding human behavior, but appears incremental as it builds on existing graph-based approaches.

The paper tackles the challenge of integrating geometric and visual features to model dynamic human-object interactions in videos, proposing the CATS framework that achieves state-of-the-art performance on two HOI benchmarks including MPHOI-72 and CAD-120.

Video-based Human-Object Interaction (HOI) recognition explores the intricate dynamics between humans and objects, which are essential for a comprehensive understanding of human behavior and intentions. While previous work has made significant strides, effectively integrating geometric and visual features to model dynamic relationships between humans and objects in a graph framework remains a challenge. In this work, we propose a novel end-to-end category to scenery framework, CATS, starting by generating geometric features for various categories through graphs respectively, then fusing them with corresponding visual features. Subsequently, we construct a scenery interactive graph with these enhanced geometric-visual features as nodes to learn the relationships among human and object categories. This methodological advance facilitates a deeper, more structured comprehension of interactions, bridging category-specific insights with broad scenery dynamics. Our method demonstrates state-of-the-art performance on two pivotal HOI benchmarks, including the MPHOI-72 dataset for multi-person HOIs and the single-person HOI CAD-120 dataset.

View on arXiv PDF

Similar