AI LG RO SYSep 27, 2024

Learning from Demonstration with Implicit Nonlinear Dynamics Models

Peter David Fagan, Subramanian Ramamoorthy

arXiv:2409.18768v32.3h-index: 23

Originality Incremental advance

AI Analysis

This addresses drift issues in policy execution for robotics, but it is incremental as it builds on prior approaches like reservoir computing.

The paper tackles error accumulation in Learning from Demonstration for robotic manipulation by proposing a recurrent neural network layer with a fixed nonlinear dynamical system, validated on a human handwriting dataset, showing greater precision and robustness compared to existing methods.

Learning from Demonstration (LfD) is a useful paradigm for training policies that solve tasks involving complex motions, such as those encountered in robotic manipulation. In practice, the successful application of LfD requires overcoming error accumulation during policy execution, i.e. the problem of drift due to errors compounding over time and the consequent out-of-distribution behaviours. Existing works seek to address this problem through scaling data collection, correcting policy errors with a human-in-the-loop, temporally ensembling policy predictions or through learning a dynamical system model with convergence guarantees. In this work, we propose and validate an alternative approach to overcoming this issue. Inspired by reservoir computing, we develop a recurrent neural network layer that includes a fixed nonlinear dynamical system with tunable dynamical properties for modelling temporal dynamics. We validate the efficacy of our neural network layer on the task of reproducing human handwriting motions using the LASA Human Handwriting Dataset. Through empirical experiments we demonstrate that incorporating our layer into existing neural network architectures addresses the issue of compounding errors in LfD. Furthermore, we perform a comparative evaluation against existing approaches including a temporal ensemble of policy predictions and an Echo State Network (ESN) implementation. We find that our approach yields greater policy precision and robustness on the handwriting task while also generalising to multiple dynamics regimes and maintaining competitive latency scores.

View on arXiv PDF

Similar