Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot
This work addresses the need for social robots to interact smoothly in multi-party scenarios, though it appears incremental as it builds on existing non-verbal cues like gaze and body pose.
The paper tackled the problem of addressee estimation for social robots by designing and deploying a deep-learning model on an iCub robot, achieving real-time performance with results compared to prior dataset tests.
Addressee Estimation is the ability to understand to whom a person is talking, a skill essential for social robots to interact smoothly with humans. In this sense, it is one of the problems that must be tackled to develop effective conversational agents in multi-party and unstructured scenarios. As humans, one of the channels that mainly lead us to such estimation is the non-verbal behavior of speakers: first of all, their gaze and body pose. Inspired by human perceptual skills, in the present work, a deep-learning model for Addressee Estimation relying on these two non-verbal features is designed, trained, and deployed on an iCub robot. The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction compared to previous tests on the dataset used for the training.