Learning Social Affordance for Human-Robot Interaction
This addresses the challenge of natural human-robot interaction by learning from human demonstrations, though it appears incremental as it builds on existing methods for affordance learning.
The paper tackles the problem of enabling robots to learn social affordances from human activity videos, allowing them to infer and replicate appropriate full-body motions in human-robot interactions, with experimental results showing automatic discovery of semantically meaningful interactive affordances from RGB-D videos.
In this paper, we present an approach for robot learning of social affordance from human activity videos. We consider the problem in the context of human-robot interaction: Our approach learns structural representations of human-human (and human-object-human) interactions, describing how body-parts of each agent move with respect to each other and what spatial relations they should maintain to complete each sub-event (i.e., sub-goal). This enables the robot to infer its own movement in reaction to the human body motion, allowing it to naturally replicate such interactions. We introduce the representation of social affordance and propose a generative model for its weakly supervised learning from human demonstration videos. Our approach discovers critical steps (i.e., latent sub-events) in an interaction and the typical motion associated with them, learning what body-parts should be involved and how. The experimental results demonstrate that our Markov Chain Monte Carlo (MCMC) based learning algorithm automatically discovers semantically meaningful interactive affordance from RGB-D videos, which allows us to generate appropriate full body motion for an agent.