PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception
This work addresses a gap in machine social perception for AI systems that need to understand complex social interactions in physical environments, though it is incremental as it builds on existing methods like inverse planning.
The authors tackled the lack of datasets for evaluating physically grounded perception of complex social interactions by creating PHASE, a dataset of 2D animations with procedurally generated social events, and introduced a Bayesian inverse planning model (SIMPLE) that outperformed state-of-the-art neural networks on social recognition and prediction tasks.
The ability to perceive and reason about social interactions in the context of physical environments is core to human social intelligence and human-machine cooperation. However, no prior dataset or benchmark has systematically evaluated physically grounded perception of complex social interactions that go beyond short actions, such as high-fiving, or simple group activities, such as gathering. In this work, we create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions by including social concepts such as helping another agent. PHASE consists of 2D animations of pairs of agents moving in a continuous space generated procedurally using a physics engine and a hierarchical planner. Agents have a limited field of view, and can interact with multiple objects, in an environment that has multiple landmarks and obstacles. Using PHASE, we design a social recognition task and a social prediction task. PHASE is validated with human experiments demonstrating that humans perceive rich interactions in the social events, and that the simulated agents behave similarly to humans. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE (SIMulation, Planning and Local Estimation), which outperforms state-of-the-art feed-forward neural networks. We hope that PHASE can serve as a difficult new challenge for developing new models that can recognize complex social interactions.