ROAIMar 5, 2021

Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms

arXiv:2103.03697v149 citations
Originality Highly original
AI Analysis

This addresses the challenge of data inefficiency and hardware changes in robotics, enabling more flexible deployment of policies across different platforms.

The paper tackles the problem of adapting a reinforcement learning policy to new robotic hardware with only a few demonstrations, achieving successful adaptation across simulated and real-robot tasks with novel physical parameters.

Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform. A policy trained with expensive data is rendered useless after making even a minor change to the robot hardware. In this paper, we address the challenging problem of adapting a policy, trained to perform a task, to a novel robotic hardware platform given only few demonstrations of robot motion trajectories on the target robot. We formulate it as a few-shot meta-learning problem where the goal is to find a meta-model that captures the common structure shared across different robotic platforms such that data-efficient adaptation can be performed. We achieve such adaptation by introducing a learning framework consisting of a probabilistic gradient-based meta-learning algorithm that models the uncertainty arising from the few-shot setting with a low-dimensional latent variable. We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots generated by varying the physical parameters of an existing set of robotic platforms. Our results show that the proposed method can successfully adapt a trained policy to different robotic platforms with novel physical parameters and the superiority of our meta-learning algorithm compared to state-of-the-art methods for the introduced few-shot policy adaptation problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes