Visual Social Relationship Recognition
This work addresses the challenge of building intelligent machines that can better interact with humans by understanding social relationships from visual data, representing an incremental advance in the field.
The paper tackles the problem of recognizing social relationships from images by proposing a Dual-Glance model and an Adaptive Focal Loss, achieving state-of-the-art performance on a new large-scale dataset of 23,311 images and 79,244 person pairs.
Social relationships form the basis of social structure of humans. Developing computational models to understand social relationships from visual data is essential for building intelligent machines that can better interact with humans in a social environment. In this work, we study the problem of visual social relationship recognition in images. We propose a Dual-Glance model for social relationship recognition, where the first glance fixates at the person of interest and the second glance deploys attention mechanism to exploit contextual cues. To enable this study, we curated a large scale People in Social Context (PISC) dataset, which comprises of 23,311 images and 79,244 person pairs with annotated social relationships. Since visually identifying social relationship bears certain degree of uncertainty, we further propose an Adaptive Focal Loss to leverage the ambiguous annotations for more effective learning. We conduct extensive experiments to quantitatively and qualitatively demonstrate the efficacy of our proposed method, which yields state-of-the-art performance on social relationship recognition.