ROAIHCLGMar 9, 2022

Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method

arXiv:2203.04951v26 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses the challenge of data-efficient robot adaptation in novel environments, though it is incremental by building on existing object-centric and pre-training approaches.

The paper tackles the problem of enabling robots to adapt to human feedback with minimal interaction by proposing an object-centric method that updates only object-specific preferences, achieving successful adaptation on a physical robot with just one human intervention.

For robots to be effectively deployed in novel environments and tasks, they must be able to understand the feedback expressed by humans during intervention. This can either correct undesirable behavior or indicate additional preferences. Existing methods either require repeated episodes of interactions or assume prior known reward features, which is data-inefficient and can hardly transfer to new tasks. We relax these assumptions by describing human tasks in terms of object-centric sub-tasks and interpreting physical interventions in relation to specific objects. Our method, Object Preference Adaptation (OPA), is composed of two key stages: 1) pre-training a base policy to produce a wide variety of behaviors, and 2) online-updating according to human feedback. The key to our fast, yet simple adaptation is that general interaction dynamics between agents and objects are fixed, and only object-specific preferences are updated. Our adaptation occurs online, requires only one human intervention (one-shot), and produces new behaviors never seen during training. Trained on cheap synthetic data instead of expensive human demonstrations, our policy correctly adapts to human perturbations on realistic tasks on a physical 7DOF robot. Videos, code, and supplementary material are provided.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes