RO AI HC LGMar 30, 2021

User profile-driven large-scale multi-agent learning from demonstration in federated human-robot collaborative environments

Georgios Th. Papadopoulos, Asterios Leonidis, Margherita Antona, Constantine Stephanidis

arXiv:2103.16434v110.413 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of robust skill transfer in large-scale human-robot collaboration for robotics researchers, but it is incremental as it extends existing federated learning schemes.

The paper tackles the challenge of improving multi-agent learning from demonstration in federated human-robot environments by designing a user profile formulation based on deep learning, which adaptively adjusts feedback importance during aggregation, resulting in enhanced short- and long-term behavior analysis.

Learning from Demonstration (LfD) has been established as the dominant paradigm for efficiently transferring skills from human teachers to robots. In this context, the Federated Learning (FL) conceptualization has very recently been introduced for developing large-scale human-robot collaborative environments, targeting to robustly address, among others, the critical challenges of multi-agent learning and long-term autonomy. In the current work, the latter scheme is further extended and enhanced, by designing and integrating a novel user profile formulation for providing a fine-grained representation of the exhibited human behavior, adopting a Deep Learning (DL)-based formalism. In particular, a hierarchically organized set of key information sources is considered, including: a) User attributes (e.g. demographic, anthropomorphic, educational, etc.), b) User state (e.g. fatigue detection, stress detection, emotion recognition, etc.) and c) Psychophysiological measurements (e.g. gaze, electrodermal activity, heart rate, etc.) related data. Then, a combination of Long Short-Term Memory (LSTM) and stacked autoencoders, with appropriately defined neural network architectures, is employed for the modelling step. The overall designed scheme enables both short- and long-term analysis/interpretation of the human behavior (as observed during the feedback capturing sessions), so as to adaptively adjust the importance of the collected feedback samples when aggregating information originating from the same and different human teachers, respectively.

View on arXiv PDF

Similar