Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection
This work addresses the challenge of accurately detecting social interactions in scenarios where subtle cues are critical, such as in social behavior analysis, though it is incremental as it builds on existing detection methods with a novel focus on fine-grained features.
The paper tackled the problem of detecting social interactions from fine-grained cues like facial expressions and gestures, which existing methods overlook, by proposing a part-aware bottom-up group reasoning framework that infers social groups and interactions using body part features and interpersonal relations. The result is a new state-of-the-art performance on the NVI dataset, outperforming prior methods.
Social interactions often emerge from subtle, fine-grained cues such as facial expressions, gaze, and gestures. However, existing methods for social interaction detection overlook such nuanced cues and primarily rely on holistic representations of individuals. Moreover, they directly detect social groups without explicitly modeling the underlying interactions between individuals. These drawbacks limit their ability to capture localized social signals and introduce ambiguity when group configurations should be inferred from social interactions grounded in nuanced cues. In this work, we propose a part-aware bottom-up group reasoning framework for fine-grained social interaction detection. The proposed method infers social groups and their interactions using body part features and their interpersonal relations. Our model first detects individuals and enhances their features using part-aware cues, and then infers group configuration by associating individuals via similarity-based reasoning, which considers not only spatial relations but also subtle social cues that signal interactions, leading to more accurate group inference. Experiments on the NVI dataset demonstrate that our method outperforms prior methods, achieving the new state of the art.