Cognitive architecture aided by working-memory for self-supervised multi-modal humans recognition
This addresses the challenge of reliable human recognition in robots for personalized interactions in domains like education and care, but it is incremental as it builds on existing self-supervised and cognitive approaches.
The paper tackles the problem of performance degradation in deep learning models for human recognition when applied to new, unseen scenarios by proposing a cognitive architecture with spatial working memory for self-supervised learning from robot sensory data, showing it effectively organizes data and is promising for autonomous robot learning.
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions, especially in scenarios like education, care-giving, and rehabilitation. Faces and voices constitute two important sources of information to enable artificial systems to reliably recognize individuals. Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task. However, when those networks are applied to different and unprecedented scenarios not included in the training set, they can suffer a drop in performance. For example, with robotic platforms in ever-changing and realistic environments, where always new sensory evidence is acquired, the performance of those models degrades. One solution is to make robots learn from their first-hand sensory data with self-supervision. This allows coping with the inherent variability of the data gathered in realistic and interactive contexts. To this aim, we propose a cognitive architecture integrating low-level perceptual processes with a spatial working memory mechanism. The architecture autonomously organizes the robot's sensory experience into a structured dataset suitable for human recognition. Our results demonstrate the effectiveness of our architecture and show that it is a promising solution in the quest of making robots more autonomous in their learning process.