Gaze Perception in Humans and CNN-Based Model
This addresses the problem of improving AI's ability to interpret human gaze for better human-AI interaction, but it is incremental as it compares existing methods without introducing new techniques.
The study compared how humans and a CNN-based model infer the locus of attention in images with multiple individuals looking at a common location, finding that humans' estimates are more influenced by scene context like the presence of the attended target and the number of individuals.
Making accurate inferences about other individuals' locus of attention is essential for human social interactions and will be important for AI to effectively interact with humans. In this study, we compare how a CNN (convolutional neural network) based model of gaze and humans infer the locus of attention in images of real-world scenes with a number of individuals looking at a common location. We show that compared to the model, humans' estimates of the locus of attention are more influenced by the context of the scene, such as the presence of the attended target and the number of individuals in the image.