LGROAug 11, 2021

Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

arXiv:2108.05382v154 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of aligning robotic skills with human intent in long-horizon tasks, offering a domain-specific improvement for robotics.

The paper tackles the problem of extracting robotic skills from imperfect demonstration data by introducing Skill Preferences (SkiP), an algorithm that uses human feedback to align skills with intent and solve downstream tasks, resulting in a simulated kitchen robot outperforming prior methods in multi-step manipulation tasks.

A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations. However, such generative models inherit the biases of the underlying data and result in poor and unusable skills when trained on imperfect demonstration data. To better align skill extraction with human intent we present Skill Preferences (SkiP), an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data. After extracting human-preferred skills, SkiP also utilizes human feedback to solve down-stream tasks with RL. We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks and substantially outperforms prior leading RL algorithms with human preferences as well as leading skill extraction algorithms without human preferences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes